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Abstract 


This  thesis  addresses  a  number  of  theoretical  issues  in  parallel  computation.  There 
arc  many  open  questions  relating  to  what  can  be  done  with  parallel  computers  and  what 
arc  the  most  effective  techniques  to  use  to  develop  parallel  algorithms.  Wc  examines  various 
problems  in  hope  of  gaining  insight  to  the  general  questions. 

One  topic  that  is  investigated  is  the  relationship  between  sequential  and  parallel  al¬ 
gorithms.  Wc  introduce*the  concept  of  a  P-complctc  algorithm  to  capture  what  it  means 
for  an  algorithm  to  be  inherently  sequential.  We  show  (that  a  number  of  sequential  greedy 
algorithms  are  P-complete,  including  the  greedy  algorithm  for  finding  a  path  in  a  graph. 
However,  a  problem  is  not  necessarily  difficult  if  an  algorithm  to  solve  it  is  P-complete.  In 
some  cases,  the  natural  sequential  algorithm  is  P-complete  but  a  different  technique  gives 
a  fast  parallel  algorithm.  This  shows  that  it  is  necessary  to  use  different  techniques  for 
parallel  computation  than  arc  used  for  sequential  computation. 

We  give  fast  parallel  algorithms  for  a  number  of  simple  graph  theory  problems.  The 
algorithms  illustrate  a  number  of  different  techniques  that  arc  useful  for  parallel  algorithms. 
The  most  important  results  are  that  the  maximal  path  problem  can  be  solved  in  RNC  and 
that  a  depth  first  search  tree  can  be  constructed  in  0(nl/,2t')  parallel  time.  This  shows 
that  substantial  speed  up  is  possible  for  both  of  these  problems  using  parallelism. 

The  filial  topic  that  we  .address  is  parallel  approximation  or  P-complete  problems. 
P-complctc  problems  probably  cannot  be  solved  by  fast  parallel  algorithms.  Wc  give  a 
number  of  results  on  approximating  P-complete  with  parallel  algorithms  that  are  similar 
to  results  on  approximating  NP-completc  problems  with  sequential  algorithms.  Wc  give 
upper  and  lower  bounds  on  the  degree  of  approximation  that  is  possible  for  some  problems. 
Wc  also  investigate  the  role  that  numbers  play  in  P-complctc  problems,  showing  that  some 
P-complctc  problems  remain  difficult  even  if  the  numbers  are  small.  _ 
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Chapter  1  Introduction 


1.1.  Parallel  Computation 

Parallel  computation  offers  substantial  opportunities  for  performing  computations 
faster  than  they  can  be  done  with  just  a  single  processor.  In  some  cases,  problems  can 
be  decomposed  into  independent  subproblems,  with  each  subproblem  being  solved  simul¬ 
taneously.  Ideally,  this  allows  a  speed  up  in  the  computation  proportional  to  the  number 
of  processors  employed.  There  are,  however,  many  difficulties  in  parallel  computation. 
Some  of  the  difficulties  are  technological,  relating  to  such  issues  as  processor  synchro¬ 
nization,  resource  contention,  and  wiring  together  processors.  Other  difficulties  are  more 
algorithmic  in  nature.  These  include  the  partitioning  of  problems  into  subproblcms  and 
the  programming  of  parallel  machines. 

There  is  a  substantial  difference  between  parallel  and  sequential  algorithms.  Sequen¬ 
tial  algorithms  can  take  advantage  of  many  intermediate  results.  Processing  can  be  done 
one  step  at,  a  time,  basing  each  decision  on  the  previous  decisions  made.  However,  for  a 
parallel  algorithm  to  be  efficient,  the  problem  must  be  decomposed  so  that  progress  can 
be  made  on  many  subproblems  at  the  same  time.  This  often  requires  a  very  different 
approach  than  is  used  in  sequential  algorithms. 

Two  of  the  most  important  questions  relating  to  parallel  computation  are:  What 
are  the  problems  that  can  be  solved  by  fast  parallel  algorithms,  and  what  general  tech¬ 
niques  work  for  parallel  algorithms.  Essentially  the  questions  arc  “what”  and  “how.”  The 
classification  problem  is  to  identify  the  problems  that  can  be  effectively  parallelized  and 
also  to  identify  the  problems  that  require  sequential  processing.  The  other  problem  is  to 
identify  general  algorithmic,  techniques.  There  arc  a  number  of  general  techniques  that 
are  commonly  employed  in  sequential  algorithms,  such  as  divide  and  conquer  and  dynamic 
programming.  It  is  important  to  develop  a  similar  set  of  approaches  for  parallel  algorithms. 
There  are  currently  just  a  few  techniques  used  in  parallel  algorithms.  It  is  hoped  that  new 
techniques  can  be  developed. 

The  goal  of  this  thesis  is  to  study  particular  problems  to  gain  insight  into  these  general 
questions.  We  take  a  theoretical  approach  by  adopting  an  abstract  model  of  parallel 
computation.  The  model  of  computation  that  we  use  is  the  P-RAM  model.  This  is  an 
idealization  of  a  parallel  machine.  Most  of  the  problems  that  we  look  at  are  very  simple 
from  the  sequential  point  of  view.  However,  these  problems  turn  out  to  be  a  challenge  to 
parallelize  and  illustrate  many  of  the  issues  involved  in  parallel  computation. 

The  study  of  parallel  computation  is  very  broad,  .and  this  thesis  addresses  issues  re¬ 
lated  to  only  a  portion  of  it.  We  are  only  concerned  with  what  could  be  called  “inherent 
parallelism.”  This  refers  to  the  parallelism  possible  in  an  ideal  computer,  where  unit  cost 
communication  is  available  between  all  processors  and  memories,  and  there  arc  no  con¬ 
straints  imposed  by  physical  layout.  We  neglect  such  issues  as  communication  complexity 
and  the  layout  of  processors,  although  they  arc  important,  both  in  theory  and  in  practice. 


The  next  section  covers  some  preliminary  results  that  we  base  our  work  on.  The  dis¬ 
cussion  covers  our  model  of  parallel  computation  and  also  covers  some  of  the  methods  that 
arc  available  to  show  that  problems  .arc  inherently  sequential.  Following  the  preliminary 
material  is  a  discussion  of  some  of  the  general  issues  in  parallel  computation.  This  thesis 
has  three  chapters  of  technical  results.  Chapter  2  is  concerned  with  the  relation  between 
sequential  and  parallel  algorithms.  It  discusses  ways  to  show  that  certain  algorithms  arc 
inherently  sequential.  Chapter  3  discusses  some  algorithms  for  certain  path  problems. 
The  algorithms  illustrate  a  number  of  important  general  techniques.  Finally,  Chapter  4 
discusses  ways  to  approximate  problems  that  probably  cannot  be  solved  by  fast  parallel 
algorithms.  The  chapter  shows  that  there  is  a  high  degree  of  similarity  between  parallel 
and  sequential  complexity  theory. 

1.2.  Preliminaries 

A  substantial  amount  of  work  has  been  done  in  studying  models  of  parallel  computa¬ 
tion  in  order  to  identify  the  appropriate  theoretical  basis  for  parallel  computation.  In  this 
section  we  discuss  some  of  the  results  that  arc  directly  relevant  to  our  work.  We  do  not 
attempt  to  give  a  complete  survey  of  the  various  models  of  parallel  computation.  There 
are  a  number  of  papers  that  survey  the  work  that  has  been  done,  including  papers  by  Cook 
[Cl],  and  Hoover  and  Ruzzo  [HR]. 

1.2.1.  The  P-RAM  Model 

The  standard  model  of  synchronous  parallel  computation  is  the  P-RAM  (Parallel  Ran¬ 
dom  Access  Machine).  This  model  captures  the  intuitive  idea  of  what  a  parallel  machine 
is.  The  P-RAM  model  has  been  described  by  a  number  or  authors  |FW|  |C2).  A  P-RAM 
consists  of  a  set  of  processors,  Pi, ... ,  Pn  and  a  set  of  global  memory  cells,  M  i,. . . ,  Mm. 
lOach  processor  is  a  HAM  [AIIU],  with  its  own  local  memory.  A  processor  can  perform 
some  standard  arithmetic  operations  and  can  access  its  own  memory  with  direct  or  indi¬ 
rect  addressing.  A  processor  can  also  communicate  with  the  global  memory  by  reading  a 
value  from  or  writing  a  value  to  any  global  memory  cell.  The  global  memory  accesses  are 
assumed  to  take  unit  time.  This  is  one  of  the  major  idealizations  of  the  P-RAM  model.  In 
a  real  parallel  computer,  one  would  expect  that  the  access  time  is  related  to  the  number 
of  processors.  A  P-RAM  has  a  single  program  that  all  processors  execute  one  step  at  a 
time.  Each  processor  has  a  register  which  contains  its  processor  number  and  instructions 
may  depend  on  this  number,  so  different  processors  may  do  different  things  on  the  same 
instruction.  Some  of  the  global  memory  cells  arc  designated  for  the  input  to  the  problem 
and  some  of  them  arc  for  the  outputs.  The  time  taken  for  an  algorithm  is  the  number  of 
instructions  that  arc  executed.  The  space  is  the  sum  of  the  number  of  processors  and  the 
number  of  memory  cells. 

There  arc  a  number  of  variants  of  the  P-RAM  model  that  handle  concurrent  reads  and 
concurrent  writes  to  the  global  memory  cells  differently.  'Flic  major  variants  are  exclusive 
read,  exclusive  write  (GREW),  concurrent  read,  exclusive  write  (CHEW)  and  concurrent 


read,  concurrent  write  (CRCW).  For  the  latter  model,  there  are  additional  variants  on 
the  nature  of  concurrent  writes.  There  is  a  difference  in  the  power  of  the  various  types 
of  P-RAM.  For  example,  it  is  trivial  to  compute  the  OR  of  n  inputs  with  a  CRCW  P- 
RAM  in  constant  time,  while  on  a  CREW  P-RAM  the  problem  requires  Q(logn)  time 
[CD].  There  are  also  separation  results  for  the  various  different  types  of  CRCW  P-R  AMS 
[FRWj.  However,  the  differences  in  the  power  of  the  models  is  not  that  great.  It  is  not 
hard  to  show  that  a  single  instruction  of  an  n  processor,  m  memory  CRCW  P-RAM  (.any 
variant)  can  be  simulated  by  an  EREW  P-RAM  with  nm  processors  and  nm  memories  in 
0(logn  4-  logm)  time.  In  this  thesis,  we  use  the  CREW  model.  However,  we  are  not  that 
interested  in  the  exact  time  or  processor  bounds  of  our  algorithms,  so  our  results  carry 
over  to  the  other  variants  of  the  P-RAM  model. 


1.2.2.  Fast  Parallel  Algorithms 

In  this  thesis,  we  deal  with  problems  that  can  be  solved  by  “fast”  parallel  algorithms 
that  use  a  “reasonable”  number  of  processors.  The  generally  accepted  definition  of  fast 
and  reasonable  is  polylog  (0(log*  n))  parallel  time  and  a  polynomial  number  of  processors 
[P] .  This  class  is  commonly  referred  to  as  SIC.  The  problems  in  SIC  arc  problems  for 
which  an  exponential  speedup  is  possible  using  parallelism;  these  problems  can  have  their 
running  times  reduced  from  polynomial  to  polylog. 

One  of  the  reasons  why  SIC  is  broadly  accepted  as  the  appropriate  class  to  use  in  the 
study  of  parallelism  is  that  it  is  a  very  robust  complexity  class.  SIC  remains  the  same 
whether  it  is  defined  in  terms  of  any  variant  of  the  P-RAM  model,  or  in  terms  of  some 
other  models,  such  as  uniform  circuits  [B][Ru].  More  refined  classes,  such  as  problems  that 
can  be  solved  in  O(logn)  or  (9(log  n)  parallel  time  depend  upon  the  particular  model  of 
computation  that  is  used.  Even  the  weakest  model  of  a  P  RAM  is  an  idealization  of  the 
type  of  machine  that  could  actually  be  built.  To  convert  the  model  to  a  more  realistic 
model,  such  as  a  bounded  degree  network  [Sc],  a  slow  down  of  a  factor  of  at  least  logn 
is  needed.  The  theoretical  models,  however,  are  accurate  models  when  factors  of  logn 
are  ignored.  The  advantage  of  SIC  is  that  it  allows  us  to  ignore  the  factors  of  logn  that 
separate  the  various  models. 

The  cliiss  ZSIC  is  the  probabilistic  analogue  of  SIC.  It  denotes  the  set  of  problems 
that  can  be  solved  with  a  probabilistic  P-RAM  in  polylog  time  with  a  polynomial  number 
of  processors.  There  arc  several  ways  that  randomness  can  be  introduced  into  the  P-RAM 
model.  For  example,  the  processors  can  be  given  coins  to  flip  or  certain  memory  locations 
can  be  assigned  random  values  at  the  start  of  the  program’s  execution. 

One  of  the  drawbacks  of  the  class  SIC  is  that  many  SIC  algorithms  arc  reasonable 
only  when  the  number  of  processors  is  very  large.  The  basic  problem  is  that  logfc  n  is  not 
a  slowly  growing  function  when  n  is  small.  The  following  table  shows  how  large  n  must 
be  so  that  n  >  log*  n  for  several  values  of  k. 


3 


2 

3 

4 


5 

6 


5, 090,333 
621,201,921 


For  example,  if  the  constant  factors  are  the  same,  an  0(log3n)  algorithm  is  not 
better  than  an  O(n)  algorithm  until  n  is  about  one  thousand.  The  size  of  n  where  an 
Oflog^n)  algorithm  is  superior  to  an  0(nl//2)  algorithm  is  astronomical.  Showing  that  a 
problem  is  in  A/C  is  just  the  first  step  to  getting  a  practical  algorithm  for  the  problem. 
The  problems  in  A/C  can  in  principle  be  solved  by  fast  parallel  algorithms;  they  are  not 
inherently  sequential. 


1.2.3.  The  Parallel  Computation  Thesis 

Parallel  time  is  closely  related  to  sequential  space.  The  parallel  computation  thesis  is: 
For  all  reasonable  models  of  computation,  parallel  time  is  polynomially  related  to  sequential 
space  [G2].  The  equivalence  of  parallel  time  and  sequential  space  has  been  proved  for  a 
number  of  specific  models  including  alternating  Turing  machines  [OKS] ,  circuits  [11]  and 
P-RAMS  [FW|  |Wy].  Let  PTIME(T(n))  denote  the  class  of  problems  that  can  be  solved  in 
0(7'(n))  time  on  a  CREW  P  RAM  and  l)SPAOE(.S(n))  denote  the  chiss  of  problems  that 
can  be  solved  in  (){S(n))  space  on  a  multitape  Turing  machine.  (For  sequential  space,  only 
the  work  space  is  counted;  the  input  is  given  on  a  separate  read-only  tape).  It  is  shown 
in  |FW]  that  PTIMI C(T(«))  C  OSPACE(7’2(n))  and  DSPACB(Jf(»))  c  PTlMK(S(n))  for 
S(»)  >  log  n. 

The  relationship  between  A 1C  and  Turing  machine  space  is: 

DSPACE(logn)  C  NSPACE(logri)  C  A/C  C  (J  DSPACE(logfc  n). 

fc>  o 

To  show  that  DSPACE(logn)  C  A/C,  it  is  necessary  to  show  that  a  Turing  machine  that 
uses  O(logn)  space  can  be  simulated  with  a  P-RAM  with  polynomial  size.  The  size  is 
polynomial  since  the  number  of  possible  stales  of  the  O(logn)  space  Turing  machine  is 
0(nk )  A  similar  proof  can  be  used  to  show  that  an  O(logn)  space  non-dctcrministic 
Turing  machine  can  be  simulated  by  a  P-RAM  with  polynomial  size.  The  space  clficient 
simulations  of  a  polylog  time  P-RAM  do  not  in  general  give  polynomial  time  algorithms. 
However,  A/C  C  P ,  since  a  polynomial  number  of  processors  can  be  simulated  by  a  single 
processor  with  a  polynomial  slowdown. 


1.2.4.  P-Completeness 

One  of  the  most  difficult  areas  of  complexity  theory  is  lower  bounds.  Very  few  non¬ 
trivial  lower  bounds  are  known  for  general  models  of  computation;  this  holds  for  parallel 
computation  as  well  as  for  sequential  computation.  A  different  approach,  which  has  proved 
far  more  successful  is  to  show  problems  to  be  at  least  as  difficult  as  other  problems.  The 
notion  of  completeness  is  away  to  identify  the  most  difficult  problems  in  a  particular  class. 
A  problem  A  is  complete  for  a  class  C  if  it  is  in  C  and  all  problems  in  C  arc  reducible  to  it 
by  some  appropriate  form  of  reduction.  If  the  problem  A  could  be  solved  efficiently,  then 
all  problems  in  C  could  be  solved  efficiently  by  using  the  solution  for  A. 

The  parallel  computation  thesis  allows  us  to  apply  results  on  space  complexity  to 
parallel  computation.  We  show  problems  to  be  log-space  complete  for  P  (P-complcte)  to 
provide  evidence  that  they  arc  difficult  to  parallelize.  A  problem  is  log-space  complete  for 
P  if  it  is  in  P  and  all  problem  in  P  arc  reducible  to  it  by  log-space  reductions.  A  problem 
A  is  log-space  reducible  to  a  problem  B  if  there  exists  a  log-space  Turing  machine  that 
converts  instances  of  A  into  equivalent  instances  of  B.  Log-space  reducibility  is  transitive, 
i.e.,  if  A  is  reducible  to  B  and  B  is  reducible  to  (' ,  then  A  is  reducible  to  C .  This  means  that 
if  a  P-complcte  problem  could  be  solved  in  0(log,c  n)  space,  then  P  C  l)SPA01£(logfc  n). 

If  a  problem  is  P-complcte,  then  it  is  unlikely  that  there  is  a  fast  parallel  algorithm  for 
it.  Log-space  transformations  can  be  done  in  0(log  n)  time  on  a  P-RAM  using  a  polynomial 
number  of  processors,  so  the  P-complcte  problems  arc  the  most  difficult  problems  in  P  to 
parallelize.  If  a  P-completc  problem  is  found  to  be  in  VC,  then  P  MC  and  P  C 
Ufc>nI)SPAC’K(Iogfc  «).  If  this  were  the  case,  then  all  problems  in  P  could  be  solved  very 
fast  in  parallel  and  could  be  solved  sequentially  using  very  little  space.  Ilotli  of  these  are 
considered  to  be  very  unlikely.  There  is  of  course,  no  known  proof  that  P  /  VC,  as  there 
is  no  known  proof  that  P  /  VP. 

Many  problems  arc  known  to  be  P-complelc.  An  important  P-completeness  result  is 
that  the  problem  of  computing  the  value  of  a  circuit  given  its  inputs  is  P-complcte  [Lad]. 
We  discuss  Hi  is  problem  in  the  next  section.  Other  important  P-complcte  problems  arc 
network  How  [(ISIS],  linear  programming  |I)LR],  and  unification  [l)KM|.  A  list  of  currently 
known  P-completc  problems  has  been  compiled  by  Hoover  and  Ruzzo  [Illlj. 

P-completeness  is  defined  in  terms  of  language  recognition,  so  according  to  the  def¬ 
inition,  we  .are  restricted  to  discussing  problems  that  have  a  yes/no  answer.  However,  it 
is  often  the  c:ise  that  we  arc  interested  in  computing  functions  instead  of  just  recognizing 
languages.  For  example,  in  the  network  flow  problem,  we  wish  to  compute  the  value  of 
the  maximum  flow  in  a  network.  One  way  to  extend  the  definition  of  P-completeness 
to  functions  is  to  introduce  a  language  associated  with  the  function.  l*'or  a  function 
/  :  {0,  1}*  —  >  {0,  1}*,  we  dcGne  the  language 

Lf  ~  {( x,k,n )  :  The  k- th  bit  of  f(x)  is  a}. 

By  definition,  the  problem  of  computing  /  is  P-completc  if  the  problem  of  recognizing  L j 
is  P-completc.  The  proof  that  network  flow  is  P-complcte  j(ISS|  actually  shows  that  the 
problem  of  computing  the  least  significant  bit  of  the  maximum  How  is  P-complcte. 
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1.2.5.  The  Circuit  Value  Problem 


The  fundamental  P-compIcte  problem  is  the  circuit  value  problem.  The  circuit  value 
problem  is:  Given  a  circuit  with  values  for  its  inputs,  compute  the  value  of  its  output. 
The  circuit  value  problem  is  clearly  in  P,  since  it  can  be  solved  by  evaluating  the  gates 
one  at  a  time.  Intuitively,  a  problem  is  P-complete  if  it  is  sufficiently  powerful  to  simulate 
the  computation  of  any  polynomial  time  bounded  Turing  machine  on  a  given  input.  The 
proof  that  the  circuit  value  problem  is  P-complete  is  a  generic  reduction  from  an  arbitrary 
problem  in  P. 

A  problem  can  be  shown  to  be  P-completc  by  giving  a  log-space  reduction  from  a 
known  P-complete  problem  to  it.  The  circuit  value  problem  is  by  far  the  most  frequently 
used  problem  for  P-complcteness  proofs.  The  reason  for  this  is  that  the  circuit  value  prob¬ 
lem  seems  to  capture  the  complexity  of  P-completcness  and  the  structure  of  the  problem 
very  often  makes  it  convenient  to  use  in  reductions.  Its  role  in  P-completencss  is  similar  to 
the  role  of  satisfiability  in  NP-completcncss.  We  now  give  a  precise  definition  of  the  circuit 
value  problem.  A  circuit  is  a  string  ft  -  0\,...,ftu  where  ft,  is  either  an  input,  (0-INPUT 
or  1-INPUT),  or  a  gate  ANl)(j, /c),  Ol\.(j,k),  or  NOT(j).  The  inputs  for  a  gate  are  lower 
numbered  gates,  thus  the  gate  ft,  -  ANl)(j,  k)  receives  its  inputs  from  the  gales  ftj  and 
ftk  with  j  <  i  and  k  <  i.  The  circuit  value  problem  is  to  determine  if  a  given  string  is  in 
the  language  of  all  circuits  that  evaluate  to  true. 

There  are  a  number  of  important  variants  of  the  circuit  value  problem  that  are  P- 
completc.  The  circuit  value  problem  is  P-completc  for  any  collection  of  gates  that  form 
a  complete  basis,  for  example  {NO'f,  Oil}  or  {NAND}.  The  circuit  value  problem  is  also 
P-complete  for  monotone  circuits,  i.c.,  if  the  logical  gates  are  AND  and  OH  [GJ|.  A  second 
version  of  the  circuit  value  problem  that  is  P-complete  is  the  planar  circuit  value  problem 
[(  ,.  The  problem  is  to  evaluate  a  circuit  that  is  laid  out  on  the  plane  without  wires 

crossing.  The  inputs  are  assumed  tc  be  along  one  edge  of  the  circuit.  The  monotone 
planar  circuit  value  problem,  however,  is  apparently  not  P-complete,  since  it  can  be  solved 
in  UC  [DC]  [Ru], 

There  are  a  number  of  minor  restrictions  of  the  circuit  value  problem  that  arc  .also 
P-completc.  These  arc  mentioned  because  they  make  a  number  of  P-complcteness  proofs 
cleaner.  The  first  restriction  is  that  the  gates  arc  limited  to  having  fanout  at  most  two. 
It  is  not  hard  to  simulate  arbitrary  fanout  with  a  fanout  two  circuit  by  introducing  extra 
gates.  In  all  of  the  P-complcteness  proofs  we  give  in  this  thesis  we  assume  the  logical  gates 
arc  restricted  to  fanout  two.  We  also  assume  that  the  inputs  (0-INPUT  and  1-INPUT), 
have  fanout  one.  A  second  restriction  is  that  the  circuit  can  be  assumed  to  be  laid  out 
in  levels,  with  each  gate  connected  only  to  gates  on  adjacent  levels.  The  planar  circuit 
value  problem  remains  P-complctc  with  this  restriction  even  when  the  gates  arc  restricted 
to  NOT  and  OR  (a  one  input  OR  is  allowed).  This  variant  is  used  in  a  proof  in  the  next 
chapter. 

P-complcteness  proofs  are  very  similar  to  NP-completcncss  proofs.  First,  the  problem 
must  be  shown  to  be  in  P .  In  most  cases  of  interest,  this  is  obvious,  linear  programming 
being  a  notable  exception.  For  a  reduction  from  the  circuit  value  problem,  it  is  necessary 
to  simulate  a  circuit.  This  entails  having  a  way  to  represent  the  values  true  and  false.  It  is 
also  necessary  to  be  able  to  combine  values  to  simulate  the  gates.  Often,  the  difficult  part 
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of  a  P-completeness  proof  is  fanning  out  values.  To  fan  out  a  value,  it  must  be  replicated 
so  that  two  other  gates  can  receive  the  value.  A  technical  detail  in  P-completcncss  proofs 
is  to  make  sure  that  values  are  only  propagated  in  the  proper  direction.  If  care  is  not 
taken,  values  may  be  propagated  backwards,  interfering  with  earlier  gates.  A  final  issue  in 
P-complctcness  proofs  is  that  the  reduction  must  be  a  log-space  reduction.  The  types  of 
local  transformations  that  are  commonly  done  in  NP-completeness  proofs  can  be  done  in 
log-space.  The  proofs  that  a  transformation  can  be  done  in  log-space  is  generally  omitted 
from  a  P-complctcness  proof. 


1.3.  Parallel  and  Sequential  Algorithms 

There  are  essentially  two  ways  to  design  a  parallel  algorithm.  One  can  cither  start  with 
a  sequential  algorithm  for  the  problem  and  attempt  to  adapt  it  to  a  parallel  machine,  or  one 
can  start  from  scratch  and  design  a  parallel  algorithm.  The  first  approach  is  appealing  for 
a  number  of  reasons.  There  has  been  a  vast  amount  of  work  done  on  developing  sequential 
algorithms,  so  it  is  hoped  that  some  of  it  carries  over  to  parallel  computation.  Some  general 
techniques  in  sequential  computation,  such  as  divide  ami  conquer,  involve  partitioning 
problems  into  independent,  subproblems.  Some  of  these  algorithms  have  nat  ural  parallel 
analogues.  There  arc  a  few  theoret  ical  results  that  show  that  certain  classes  of  computation 
can  be  converted  to  parallel  algorithms.  For  example,  it  is  known  that  programs  that 
cojnpu to  certain  polynomials  can  be  converted  to  fast  parallel  algorithms  jVSIUt).  There 
is  also  a  practical  interest  in  converting  sequential  algorithms  to  parallel  algorithms.  Over 
the  years,  many  programs  have  been  written  for  sequential  computers.  Many  people  want 
compilers  that  will  compile  the  code  for  parallel  machines,  to  avoid  having  to  rewrite  the 
code.  In  certain  domains,  such  as  numerical  computation,  this  approach  is  likely  to  be  at 
least  partially  successful.  It  is  possible  to  identify  a  certain  amount  of  parallelism  in  vector 
computations  automatically. 

The  approach  of  directly  converting  sequential  algorithms  to  parallel  algorithms  has 
its  limitations.  Some  sequential  algorithms  process  information  in  a  way  that  seems  to  be 
inherently  sequential.  Each  step  may  directly  depend  upon  the  previous  step,  so  it  is  not 
possible  to  decompose  the  computation  into  independent  sub-computations.  In  Chapter  2, 
we  investigate  the  relationship  between  sequential  and  parallel  algorithms.  We  introduce 
the  notion  of  a  F’-completc  algorithm.  This  gives  us  a  way  to  identify  inherently  sequential 
algorithms.  A  I’-completc  algorithm  cannot  be  converted  to  a  fast  parallel  algorithm  unless 
P  -  SIC.  We  give  a  number  of  examples  of  simple  algorithms  that  are  I’-complclc. 

Showing  that  an  algorithm  is  inherently  sequential  does  not  show  that  the  problem 
that  the  algorithm  solves  is  necessarily  difficult.  There  arc  a  number  of  problems  where 
the  natural  sequential  algorithm  is  P-complcte,  but  a  different  approach  can  be  used  to 
construct  a  fast  parallel  algorithm.  In  cases  where  an  algorithm  is  I’-completc,  it  is  nec¬ 
essary  to  start  from  scratch  to  attempt  to  find  a  fast  parallel  algorithm.  This  shows  that 
in  some  cases  completely  different  approaches  arc  needed  for  parallel  algorithms  than  .are 
used  for  sequential  algorithms. 


1.4.  Techniques  for  Parallel  Algorithms 


The  techniques  that  arc  used  for  parallel  algorithms  are  quite  limited.  The  technique 
that  is  most  commonly  used  is  referred  to  as  path  doubling.  The  essential  idea  in  path 
doubling  is  that  at  each  phase  a  processor  doubles  the  amount  of  information  that  it  has. 
For  example,  in  a  summation  algorithm,  each  step  doubles  the  number  of  values  for  which 
a  processor  has  the  sum.  A  second  example  is  traversing  a  linked  list.  There  is  a  processor 
associated  with  each  item  in  the  list,  and  the  processor  has  a  pointer  to  another  item.  Each 
step  of  the  algorithm  doubles  the  distance  that  is  covered  by  each  pointer.  Path  doubling 
also  appears  in  more  sophisticated  guises.  For  example,  it  is  used  by  Helmbold  and  Mayr 
in  their  algorithm  to  compute  an  optimal  two  processor  schedule  [HM2]. 

One  of  the  promising  developments  in  parallel  algorithms  is  that  new  techniques  are 
being  discovered.  One  of  the  major  new  techniques  is  what  we  refer  to  as  the  iterated 
improvement  strategy.  Instead  of  seeking  to  solve  the  problem  in  one  shot,  an  iterated 
improvement  algorithm  builds  its  solution  in  a  number  of  phases.  Often,  each  phase  of 
an  iterated  improvement  algorithm  reduces  the  number  of  candidates  for  the  solution  by 
a  constant  fraction,  so  that  there  arc  only  O(logn)  phases.  One  of  the  first  uses  of  this 
approach  was  by  Karp  and  Wigderson  in  their  maximal  independent  set  algorithm  [KW]. 
A  maximal  independent  set  in  a  graph  is  a  maximal  set  of  vertices  with  no  edges  between 
them.  The  algorithm  maintains  a  set  I  which  is  eventually  a  maximal  independent  set.  At 
each  phase,  the  number  of  vertices  that  arc  not  in  f  or  adjacent  to  members  of  /  is  reduced 
significantly.  A  second  important  algorithm  that  uses  iterated  improvement  is  the  Karp- 
Upfal-  Wigderson  matching  algorithm  [KUWJ.  This  algorithm  finds  a  perfect  matching  by 
identifying  subsets  of  the  edges  that  are  contained  in  a  perfect  matching.  Once  edges  are 
put  into  the  solution  set,  they  are  not  removed.  This  method  is  in  sharp  contrast  to  the 
sequential  algorithms  for  matching  which  move  edges  into  and  out  of  the  solution  [PS]. 

The  use  of  randomness  has  been  gaining  popularity  in  parallel  algorithms.  Quite  often 
it  is  possible  to  generate  certain  objects  with  random  choices,  but  it  seems  more  difficult 
to  do  it  deterministically.  A  typical  situation  is  for  random  choices  to  be  good  with  high 
probability,  but  to  guarantee  that  the  choices  arc  good  requires  basing  each  choice  on 
the  other  choices  made.  Randomness  seems  to  reduce  decisions  from  being  global  to  being 
local.  Probabilistic  techniques  are  often  used  in  conjunction  with  the  iterated  improvement 
strategy.  In  some  cases,  it  is  possible  to  get  rid  of  the  randomness  by  showing  that  a  small 
sample  space  is  sufficient,  and  then  searching  the  sample  space  exhaustively  [KW][Lu[. 

In  Chapter  3  of  this  thesis,  wc  look  at  parallel  algorithms  for  some  path  problems 
and  use  some  of  these  techniques.  The  path  problems  that  we  look  at  can  be  solved  by 
simple  sequential  algorithms  but  are  more  difficult  to  solve  with  fast  parallel  algorithms. 
Much  of  our  work  on  path  problems  was  motivated  by  the  problem  of  depth  first  search. 
A  related  problem  is  to  compute  a  maximal  path.  A  maximal  path  is  a  simple  path  that 
cannot  be  extended.  Our  major  results  of  the  chapter  .arc  that  a  maximal  path  can  be 
found  by  an  RMC  algorithm,  and  that  a  depth  first  search  tree  can  be  constructed  in  time 
0(n for  an  n  vertex  graph.  Wc  also  give  algorithms  for  some  other  path  problems. 
The  algorithms  employ  a  number  of  the  new  techniques.  None  of  them  depend  directly  on 
path  doubling. 
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1.5.  Coping  with  P-Completeness 


P-complete  problems  probably  cannot  be  solved  by  fast  parallel  algorithms.  However, 
it  is  still  important  to  see  what  can  be  done  with  these  problems  using  parallelism.  One 
approach  is  to  look  for  algorithms  that  arc  substantially  bister  than  the  known  sequential 
algorithms.  Algorithms  which  run  in  sublinear  time,  say  n1/2  can  be  of  practical  impor¬ 
tance.  This  is  in  contrast  to  the  analogous  situation  for  sequential  algorithms,  where  super 
polynomial  algorithms  are  rarely  practical.  A  second  approach  is  to  look  for  an  approxi¬ 
mate  solution  to  the  problem.  In  some  cases,  a  solution  that  is  close  to  the  desired  solution 
can  be  found  by  a  fast  parallel  algorithm. 

There  has  been  a  substantial  amount  of  work  done  on  the  approximation  of  NP- 
completc  problems.  For  some  problems,  there  are  polynomial  time  algorithms  which  find 
solutions  that  are  close  to  the  optimal  solution.  There  are  also  results  which  give  lower 
bounds  on  the  degree  of  approximation  that  is  possible  assuming  that  P  M  P .  In  Chapter 
4  we  study  the  parallel  approximation  of  P-complete  problems.  Our  results  are  similar  to 
the  results  on  approximating  NP-completc  problems.  One  problem  that  we  look  at  is 
finding  a  subgraph  of  a  graph  that  has  all  vertices  with  high  degree.  For  a  variant  of 
the  problem  we  establish  bounds  on  the  degree  of  approximation  that  is  possible.  The 
similarity  between  sequential  and  parallel  approximation  is  particularly  strong  for  number 
problems.  Some  NP-completc  number  problems  are  tractable  if  the  numbers  involved 
are  small.  This  has  motivated  the  distinction  between  strong  and  weak  NP-completcncss 
[G J 2] .  We  make  the  same  distinction  for  P-complete  problems  and  give  an  example  of  a 
strongly  P-complete  problem. 


1.6.  Notation  and  Conventions 

Many  of  the  problems  that  we  look  at  in  this  thesis  are  simple  graph  theory  problems. 
We  use  is  fairly  standard  notation.  We  generally  denote  a  graph  by  C  (F,  E)  where  F 
is  the  set  of  vertices  and  K  is  the  set  of  edges.  Where  we  neglect  to  state  it  explicitly, 
the  number  of  vertices  is  n.  We  make  frequent  use  of  the  notation  that  identifies  a  set  of 
vertices  with  the  induced  subgraph  on  the  vertices.  For  V'  C  F ,  the  induced  subgraph  is 
the  graph  C  =  (F',  ( V '  x  V')  n  E). 

In  this  thesis,  we  describe  algorithms  at  a  moderately  high  level.  In  most  cases  our 
interest  is  to  show  that  a  problem  can  be  solved  in  MC,  as  opposed  to  giving  the  best 
possible  parallel  algorithm  for  the  problem.  We  arc  not  that  worried  about  the  exact 
power  of  the  logarithm  in  an  algorithm’s  running  times  or  the  degree  of  the  polynomial 
for  the  number  of  processors  used.  In  cases  where  we  do  claim  explicit  time  and  processor 
bounds,  they  arc  with  respect  to  a  CREW  P-ltAM  implementation. 

In  our  algorithms,  we  take  advantage  of  many  known  parallel  algorithms  for  graph 
problems.  A  very  important  algorithm  that  we  use  a  number  of  times  is  the  Karp-Upfal- 
Wigderson  matching  algorithm.  The  algorithm  is  an  PMC  algorithm  that  finds  a  maximum 
cardinality  matching  in  a  graph.  The  matching  algorithm  can  be  used  to  solve  a  number 
of  other  important  problems.  For  example,  it  can  be  used  to  find  a  maximum  flow  in  a 
unit  capacity  network.  We  also  use  quite  a  few  subroutines  to  solve  simple  graph  theory 


problems.  We  use  algorithms  for  such  problems  as  finding  connected  components  [V], 
finding  articulation  points  [TV],  and  finding  a  shortest  path  between  two  vertices.  We  also 
rely  on  parallel  algorithms  for  maintaining  data  structures  and  manipulating  graphs.  We 
do  not  go  into  the  details  of  these  operations.  The  parallel  algorithms  for  these  problems  are 
not  very  complicated,  especially  when  we  are  only  concerned  with  getting  VC  algorithms, 
as  opposed  to  getting  the  best  algorithms  possible. 

We  describe  our  algorithms  in  a  PASCAL-like  Language.  Many  of  the  statements  arc 
English  language  descriptions.  It  would  not  be  difficult  to  convert  these  to  give  a  more  de¬ 
tailed  implementation  of  the  algorithms.  Some  of  our  algorithms  appear  rather  sequential, 
their  parallelism  arises  from  the  parallel  implementation  of  the  individual  statements.  We 
do  use  a  few  explicit  parallel  control  structures  in  our  algorithm  descriptions.  We  use  a 
statement  which  has  the  form:  for  each  x  do  in  parallel.  This  has  the  natural  meaning 
of  running  independent  copies  of  the  routine  for  each  x  and  then  combining  the  results 
when  the  routines  are  all  done. 


Chapter  2  P-Complete  Algorithms 


2.1.  Introduction 

One  of  the  interesting  and  challenging  aspects  of  parallel  computation  is  that  different 
techniques  need  to  be  used  for  parallel  algorithms  from  those  that  are  used  for  sequential 
algorithms.  In  sequential  algorithms,  processing  is  done  one  step  at  a  time.  This  allows 
solutions  to  be  constructed  in  many  phases,  with  choices  depending  upon  all  of  the  earlier 
choices  that  were  made.  However,  to  get  fast  parallel  algorithms,  many  choices  need  to 
be  made  simultaneously.  The  computations  need  to  be  localized,  with  only  a  small  depen¬ 
dence  between  the  various  components  of  the  computation.  In  this  chapter,  some  simple 
sequential  algorithms  are  examined.  Strong  evidence  is  presented  that  the  techniques  used 
in  these  algorithms  are  often  inherently  sequential,  so  there  is  little  hope  that  they  can  be 
sped  up  substantially  with  parallelism. 

Some  sequential  algorithms  have  fairly  direct  parallel  counterparts.  The  parallel  algo¬ 
rithm  can  be  thought  of  as  a  parallel  implementation  of  the  sequential  algorithm.  A  trivial 
example  is  matrix  multiplication.  The  straightforward  sequential  algorithm  computes  the 
entries  of  the  product  one  at  a  time,  while  the  parallel  algorithm  computes  the  entries 
simultaneously.  However,  other  sequential  algorithms  take  advantage  of  being  able  to  base 
decisions  on  accumulated  information.  For  ex.imple,  the  sequential  algorithms  for  match¬ 
ing  start  with  an  initial  solution  and  improve  the  solution  through  a  number  of  phases 
of  augmentation.  The  known  parallel  algorithms  [KUVV]  for  matching  take  a  completely 
different  approach. 

In  this  chapter  we  formalize  what  it  means  for  an  algorithm  to  be  inherently  sequen¬ 
tial  by  relating  computations  to  certain  P-complete  problems.  The  term  “P-complete 
algorithm”  is  introduced  to  describe  these  algorithms.  This  provides  strong  evidence  that 
some  algorithms  are  inherently  sequential.  When  we  show  th.it  a  certain  approach  to  a 
problem  probably  cannot  yield  a  fast  parallel  solution,  this  docs  not  imply  that  the  prob¬ 
lem  is  difficult.  There  are  a  number  of  examples  of  problems  where  the  natural  sequential 
algorithm  is  P-complete,  but  a  different  approach  can  be  used  to  construct  a  fast  parallel 
algorithm. 

The  sequential  algorithms  that  we  examine  are  greedy  algorithms.  The  greedy  para¬ 
digm  is  a  very  important  general  technique  used  in  many  sequential  algorithms.  Examples 
of  greedy  algorithms  include  Kruskal’s  minimum  spanning  tree  algorithm  [K]  and  the  well 
known  algorithm  for  depth  first  search  |T].  A  greedy  algorithm  is  one  which  builds  its 
solution  one  step  at  a  time.  Items  are  added  to  the  partial  solution  by  picking  the  “best” 
choice  by  some  generally  simple  criterion.  Once  an  item  is  added  to  the  solution,  it  will  not 
be  discarded,  thus  there  is  no  backtracking.  Greedy  algorithms  often  seem  very  sequential 
in  nature,  since  the  choice  of  which  item  to  add  to  the  solution  set  frequently  depends 
on  many  of  the  previous  choices.  In  this  chapter  wc  show  that  the  greedy  algorithms  for 
several  simple  problems  are  P-complctc.  Wc  show  that  the  greedy  algorithms  for  finding 
a  maximal  path,  for  finding  a  set  of  disjoint  paths,  and  for  approximating  a  maximum  cut 
arc  all  P-complete. 


2.2.  Definition  of  P-completeness  for  Algorithms 

The  purpose  of  extending  the  notion  of  P-complcteness  to  algorithms  is  to  be  able  to 
capture  the  idea  that  an  algorithm  is  (probably)  inherently  sequential.  The  definition  of 
P-complctcncss  for  algorithms  that  we  give  must  in  some  way  capture  what  it  means  to 
implement  a  sequential  algorithm  as  a  parallel  algorithm.  In  other  words,  the  definition 
must  establish  some  kind  of  correspondence  between  sequential  and  parallel  algorithms. 
To  justify  the  term  “P-completencss,”  our  definition  should  give  as  much  evidence  for 
the  algorithm  being  inherently  sequential  as  there  is  that  a  P-coinplete  problem  cannot 
be  solved  by  a  fast  parallel  algorithm.  To  do  this,  our  definition  should  imply  that  if  a 
P-complete  algorithm  could  be  implemented  as  an  A/C  .algorithm  then  P  —  A/C.  There  are 
a  number  of  different  ways  that  P-completeness  can  be  defined  for  algorithms.  The  basic 
idea  in  the  definitions  is  that  the  problem  of  performing  the  same  computation  as  is  done 
by  the  sequential  algorithm  is  a  P-complete  problem. 

One  way  to  define  an  algorithm  to  be  P-complete  is  in  terms  of  the  full  computation  of 
a  Turing  machine.  The  computation  of  a  polynomial  time  Turing  machine  on  a  particular 
input  can  be  summarized  by  a  string  of  polynomial  length.  For  a  Turing  machine  M , 
this  string  can  be  viewed  as  a  function  Jm(x)  of  the  input  x.  A  possible  definition  of 
P-completcness  of  an  algorithm  A  with  Turing  machine  M  is:  A  is  P-complete  if  the 
problem  of  computing  f\i(x)  is  P-complete.  The  drawback  to  this  approach  is  that  it  is 
heavily  dependent  on  the  actual  Turing  machine  corresponding  to  an  algorithm.  It  is  rarely 
desirable  to  have  to  describe  algorithms  in  terms  of  Turing  machine  implementations.  The 
advantage  of  this  approach  is  that  it  fully  captures  the  computation  of  the  algorithm. 

The  definition  of  a  P-coinpletc  algorithm  that  we  use  is  based  on  functions  that  solve 
search  problems.  A  search  problem  11  consists  of  a  set  of  instances  D\\  and  set  of  solutions 
S[/|  for  each  /  C  Du  [CJJ3  pp.  1 1 0] .  The  problem  can  be  viewed  as  a  relation  R  — 
{(*,!/)  |  x  <E  Du  ,y  G  S[z]}.  An  algorithm  for  a  search  problem  is  a  function  /  such  that 
(*./(*))  G  R.  A  simple  example  of  a  search  problem  is  the  spanning  tree  problem.  The 
solutions  for  a  graph  C  form  the  set  of  all  spanning  trees  of  G.  An  algorithm  that  solves 
the  spanning  tree  problem  is  one  that  finds  some  spauning  tree.  For  a  search  problem, 
there  arc  many  possible  algorithms  that  solve  the  problem.  Our  definition  of  a  P-complete 
algorithm  is: 

Definition  2.1.  An  algorithm  A  for  a  search  problem  is  P-completc  if  the  problem  of 
computing  the  solution  found  by  A  is  P-complete. 

A  shortcoming  of  our  definition  of  a  P-completc  algorithm  is  that  it  docs  not  imme¬ 
diately  relate  to  the  internal  computation  of  the  algorithm.  For  some  algorithms  it  is  the 
method  used  to  get  the  answer  that  appears  sequential  in  nature.  The  way  to  handle  this 
with  our  current  definition  is  to  redefine  the  result  of  the  algorithm  so  that  it  includes  a 
trace  of  the  computation.  This  means  that  we  include  with  the  result  a  list  of  certain  inter¬ 
nal  states  of  the  computation.  For  example,  if  an  algorithm  computes  the  partial  solutions 
Sj , . . . ,  S„  on  its  way  to  the  solution  S,  then  we  could  define  the  result  as  Si , . . . ,  S„,  S. 
This  allows  us  to  handle  most  cases  of  interest  with  our  definition  of  P-completeness  for 
algorithms  without  having  to  deal  with  the  details  of  Turing  machine  implementation. 


In  order  to  prove  that  an  algorithm  is  P-complctc,  it  is  necessary  to  have  a  fairly  precise 
statement  of  the  algorithm.  When  describing  an  algorithm,  it  is  common  practice  to  leave 
certain  steps  unspecified  with  statements  like  “pick  an  unmarked  vertex.”  This  is  done 
when  several  choices  are  satisfactory  and  there  is  no  need  to  encumber  the  description  with 
superfluous  detail.  When  implementing  the  algorithm,  some  arbitrary  choices  need  to  be 
made.  Often  a  reasonable  choice  to  make  is  to  choose  the  lowest  numbered  element.  When 
this  choice  is  made,  frequently  the  resulting  solution  is  the  lexicographically  minimum 
solution.  The  natural  lexicographic  order  on  strings  is  sj  <iex  <j  •  •  •  t*  if  •  •  •  Sj  = 

t\  ■  ■  ■  tj  and  j  <  k  or  s  i  •  •  •  s,  =  <1  •  •  •  t,  and  s,+i  <  tf+l.  We  shall  use  the  word  lexmin  in 
place  of  the  cumbersome  phrase  “lexicographically  minimum.”  For  graph  problems  with 
edges  represented  by  adjacency  lists,  another  reasonable  approach  for  unspecified  choices  of 
edges  is  to  take  the  first  available  edge  from  a  list.  The  result  of  this  approach  is  generally 
equivalent  to  choosing  the  lowest  numbered  adjacent  vertex  when  the  edges  arc  ordered  in 
the  list  by  vertex  numbers. 

An  example  of  a  P-complctc  algorithm  is  the  greedy  .algorithm  for  finding  a  maximal 
independent  set.  For  a  graph  G  -  ( V ,  1C)  an  independent  set  I  is  a  subset  of  the  vertices 
such  that  there  arc  no  edges  between  vertices  in  I.  A  maximal1  independent  set  is  an 
independent  set  that  is  not  properly  contained  in  any  other  independent  set.  ,phc  sequential 
algorithm  constructs  a  maximal  independent  set  by  considering  the  items  one  at  a  time. 
If  an  item  is  not  adjacent  to  the  independent  set  when  it  is  considered,  then  it  is  added  to 
the  independent  set.  The  algorithm  is: 

Maximal  Independents  et[G) 
begin 
/  0; 

for  t  <—  1  to  |F|  do 
if  v,  (/  N(l)  then 
I  «-■  /  U  {«,}; 

end. 

One  approach  to  designing  a  fast  parallel  algorithm  for  finding  a  maximal  independent  set 
is  to  attempt  to  implement  the  sequential  algorithm  ns  a  parallel  algorithm.  The  algorithm 
would  have  to  decide  whether  or  not  to  include  an  element  v,  in  /  without  building  /  step 
by  step.  However,  this  approach  is  not  likely  to  be  successful.  The  solution  that  is  found 
by  this  algorithm  is  the  lexmin  solution.  The  problem  of  computing  the  lexmin  maximal 
independent  set  is  P-complcte,  so  the  algorithm  is  a  P-complctc  algorithm.  This  result 
is  due  to  Cook,  who  showed  that  the  complementary  problem  of  computing  the  lexmin 
maximal  clique  is  P-complctc  [C2] .  Although  this  sequential  algorithm  apparently  cannot 
be  used  to  create  a  fast  parallel  algorithm,  a  different  approach  can  be  used  to  construct 
a  fast  parallel  algorithm.  Wigderson  and  Karp  [KWj  developed  a  probabilistic  parallel 
algorithm  for  constructing  a  maximal  independent  set.  Their  algorithm  can  be  converted 
into  a  deterministic  algorithm,  so  the  problem  can  be  solved  in  MC.  A  simpler  maximal 
independent  set  algorithm  has  been  found  by  Luby  [Lu]. 

1  In  this  thesis,  we  use  maximal  to  denote  something  that  cannot  be  extended,  and  we  use  maximum 
to  indicate  maximum  cardinality. 


2.3.  Finding  a  Maximal  Path 


The  first  algorithm  that  we  show  to  be  P-complete  is  a  simple  algorithm  for  finding 
a  path  in  a  graph.  The  algorithm  builds  a  path  one  vertex  at  a  time  by  going  from  the 
current  endpoint  to  its  lowest  numbered  neighbor  that  is  not  already  on  the  path.  The 
algorithm  runs  until  a  vertex  is  encountered  that  has  all  of  its  neighbors  on  the  path  so 
that  the  path  can  not  be  extended.  A  path  that  cannot  be  extended  is  a  maximal  path. 
The  greedy  algorithm  for  finding  a  maximal  path  starting  at  a  given  vertex  r  is: 

GreedyMaximalPath(G,  r) 
begin 

P  *-  r;  v  «—  r; 

while  v  has  an  unvisited  neighbor  do 

begin 

w  «—  lowest  numbered  unvisited  neighbor  of  v; 

P  *-  Pw ; 

u  «—  to; 

end 

end. 

The  greedy  algorithm  computes  the  lexmin  maximal  path.  We  show  that  computing 
the  lexmin  maximal  path  is  a  P-complete  problem,  so  that  this  algorithm  is  P-complete. 
However,  this  docs  not  mean  that  a  maximal  path  cannot  be  found  by  a  fast  parallel 
algorithm.  The  next  chapter  gives  an  ZMC  algorithm  for  finding  a  maximal  path.  The 
greedy  algorithm  for  finding  a  maximal  path  is  closely  related  to  the  algorithm  lor  finding 
a  depth  first  search  tree  of  a  graph.  The  greedy  maximal  path  algorithm  finds  the  initial 
branch  of  the  lexmin  depth  first  search  tree,  so  our  results  imply  that  the  greedy  algorithm 
for  depth  first  search  is  P-complete.  The  original  proof  that  computing  the  lexmin  depth 
first  search  tree  is  P-complete  is  due  to  lteif  [lie]. 


2.3.1.  Directed  Lexmin  Maximal  Path 

Wc  show  that  the  problem  of  computing  the  maximal  path  found  by  the  greedy  algo¬ 
rithm  is  P-complete.  We  first  show  the  result  for  directed  graphs  and  then  for  undirected 
planar  graphs.  The  proof  for  directed  graphs  is  simpler  and  conveys  the  intuition  of  why 
the  problem  is  difficult  better  than  the  proof  for  planar  graphs.  The  second  result  is 
stronger  since  it  applies  to  a  very  restrictive  class  of  graphs.  Finding  the  lexmin  maximal 
path  in  an  undirected  graph  is  a  special  case  of  the  problem  for  directed  graphs,  since  .an 
undirected  edge  can  be  viewed  as  a  pair  of  directed  edges. 

Theorem  2.1.  The  problem  of  computing  the  lexmin  maximal  path  is  P-compIcte  for 
directed  graphs. 

Proof:  The  proof  is  a  reduction  from  the  monotone  circuit  value  problem.  Let  0  = 
(3 i,...,/?n  be  an  instance  of  the  monotone  circuit  value  problem.  The  circuit  0  is  trans¬ 
formed  in  log-space  to  a  graph  with  distinguished  vertices  r  and  v  such  that  v  will  be  on 
the  lexmin  path  from  r  if  and  only  if  the  circuit  evaluates  to  true. 


For  each  gate  0k  there  is  a  collection  of  vertices.  A  gate  is  simulated  by  the  way  that 
the  lexmin  path  passes  through  the  vertices  corresponding  to  that  gate.  The  gates  are 
evaluated  in  order,  with  the  path  Grst  passing  through  the  vertices  for  /?j,  then  02,  and 
so  on.  There  are  vertices  k,n  rind  kout  in  the  collection  of  vertices  for  /?*.  The  segment  of 
the  lexmin  path  between  the  vertices  fc,n  and  k„ut  visits  certain  vertices  to  test  the  values 
of  the  inputs  to  0k  and  then  visits  other  vertices  to  indicate  the  value  of  the  output  of  the 
gate. 

A  key  component  of  the  simulation  is  a  switch  which  is  used  to  indicate  the  value  of  a 
wire.  For  each  gate  there  is  one  switch  for  each  output.  The  vertices  of  these  switches  are 
traversed  during  the  simulation  of  the  gate  to  indicate  a  true  value,  and  they  arc  bypassed 
to  indicate  a  false  value.  If  a  switch  for  0k  is  not  visited  when  simulating  0k,  it  might  be 
traversed  when  simulating  a  gate  0j  which  receives  an  input  from  0k- 

The  gadgets  for  the  gates  are  shown  in  the  figures  below.  In  the  figures,  the  switches 
axe  the  groups  of  four  vertices  drawn  in  a  square.  The  gadget  for  gate  0k  is  connected  to 
the  gadget  for  0kk\  by  an  edge  from  kout  to  (k  +  1),„.  If  a  gate  0k  receives  an  input  from 
gate  0t,  then  the  gadget  for  0k  is  connected  to  the  output  switch  of  0,.  The  AND  and  OR 
gates  arc  illustrated  as  2-output  gates  0k  that  receive  inputs  from  0,  and  /?,.  The  vertices 
of  the  graph  that  is  constructed  arc  labelled  so  that  the  labels  of  the  vertices  associated 
with  0t  are  less  than  those  associated  with  0k  for  i  <  k.  In  addition,  within  gate  0k,  the 
labels  arc  as  indicated  in  the  figures,  where  k  <  kr  <  kr+i. 


0-INPUT  1-INPUT 


The  circuit  is  simulated  by  constructing  the  lexmin  path  starting  at  the  vertex  1,„. 
From  a  vertex  v  the  path  goes  to  the  lowest  numbered  neighbor  of  v  that  is  not  already 
on  the  path.  For  a  0-INPUT,  the  path  goes  directly  fc,„  to  kout ,  and  for  a  1-INPUT  the 
path  traverses  the  switch  on  its  route  from  k,„  to  k„ut.  For  an  AND  gate  if  either  of  the 
input  switches  has  not  been  traversed  when  the  path  gets  to  k,„  (so  the  gate  is  receiving 
a  false  input),  then  the  path  goes  through  an  input  switch  to  k„ut,  bypassing  the  output 
switches.  For  an  OR  gate,  if  cither  of  the  switches  has  been  visited  then  the  path  will  go 
through  the  output  switches.  In  the  illustration  of  an  OR  gate,  the  highlighted  path  shows 
what  happens  when  the  gate  receives  a  false  input  from  0,  and  a  true  input  from  0}.  If 
the  lexmin  path  visits  a  vertex  in  the  output  switch  of  the  final  gale  0„,  then  the  circuit 
evaluates  to  true,  and  if  the  path  does  not  visit  the  output  switch,  the  circuit  evaluates  to 
false.  | 


“2 


OR  gate 


2.3.2.  Planar  Lexmin  Path 

The  P-completcncss  result  can  be  strengthened  by  showing  that  it  applies  to  a  very 
restricted  class  of  graphs.  We  show  that  computing  the  lexmin  path  is  l’-complctc  even 
for  undirected  planar  graphs  witli  maximum  degree  three.  Many  graph  problems  appear 
to  be  much  easier  when  they  are  restricted  to  planar  graphs.  For  example,  both  depth 
first  search  (Smj  and  network  flow  [JV]  can  be  solved  in  MC  for  planar  graphs.  The  P- 
complcteuess  proof  that  we  give  is  similar  to  the  previous  proof.  We  reduce  from  a  variant 
of  the  circuit  value  problem  ami  simulate  the  gates  in  the  same  manner  as  above.  The 
circuit  value  problem  that  we  use  is  for  a  layered  planar  circuit  made  up  of  NOT  and  OR 
gates.  The  gates  arc  on  levels,  with  the  gate  /?*.  +  i  immediately  to  the  right  of  /?*  unless  /?* 
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is  the  right  most  gate  on  a  level,  in  which  case  f)k  f  j  is  the  leftmost  gate  on  the  next  level. 
The  wires  between  gates  run  only  between  consecutive  levels  and  the  wires  do  not  cross. 


Theorem  2.2.  The  problem  of  computing  the  lexmin  maximal  path  in  undirected  planar 
graphs  with  maximum  degree  three  is  P-complcte. 

Proof:  We  prove  this  theorem  by  giving  a  log-space  transformation  from  the  circuit  value 
problem  for  layered  planar  OR-NOT  circuits.  The  planar  circuit  value  problem  remains 
P-complcte  with  this  restriction.  The  reduction  of  an  arbitrary  circuit  to  a  planar  circuit 
[Cl]  can  be  modified  so  that  this  type  of  planar  circuit  is  gcntcrated. 

The  construction  is  the  same  as  used  in  Theorem  2.1  except  that  we  use  the  gadgets 
below.  The  gadgets  for  the  gates  can  be  pul  in  levels  and  the  connections  to  the  switches 
can  be  done  without  edge  crossings.  The  only  violation  of  planarity  is  the  edge  between 
k0ut  and  (A:  f  l)„,  when  f3k  and  are  on  separate  levels.  This  problem  can  be  solved 

by  laying  the  graph  out  on  a  cylinder  instead  of  the  plane.  The  cylinder  can  then  be 
projected  onto  the  plane  to  achieve  a  planar  layout.  The  vertex  labels  shown  in  the  figures 
arc  just  0,  1,2,  3,  .and  4.  The  vertex  labels  can  be  made  unique  by  replacing  each  label  k 
{k  G  {0, . . . ,  4}),  by  a  label  in  {/cn, . . . ,  [k  +  l)n}. 

The  circuit  is  again  simulated  by  computing  the  lexmin  path  from  the  vertex  ltn.  The 
path  will  visit  the  output  switch  of  the  final  gadget  if  and  only  if  the  output  of  the  circuit 
is  true.  | 
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Since  the  Iexmin  maximal  path  is  the  initial  branch  of  the  iexmin  depth  Grst  search 
tree,  we  have  the  following  corollary.  This  is  an  improvement  of  the  result  of  Reif  [Re]. 

Corollary  2.1.  Computing  the  Iexmin  depth  Grst  search  tree  is  P-complcte  for  planar 
graphs. 


2.4.  Finding  a  Maximal  Set  of  Disjoint  Paths 

A  second  path  problem  that  can  be  solved  by  a  simple  greedy  algorithm  is  to  find  a 
maximal  set  of  disjoint  paths.  The  problem  is: 

Given  a  graph  G  ~  ( V ,  E )  and  a  subset  U  of  V,  Gnd  a  maximal  set  of  vertex 

disjoint  paths  joining  the  vertices  of  U. 

The  set  of  paths  is  required  to  be  maximal  in  the  sense  that  no  more  paths  joining  vertices 
of  U  can  be  added  to  it.  A  greedy  algorithm  solves  this  problem  by  finding  paths  one  at  a 
time  until  no  more  paths  can  be  found.  In  this  section  we  examine  the  problem  when  it  is 
restricted  to  a  layered  directed  acyclic  graph  (dag).  A  layered  graph  has  all  of  its  vertices 
in  levels  with  edges  only  between  consecutive  levels.  The  motivation  for  looking  at  this 
restricted  case  is  that  it  occurs  as  a  subroutine  in  a  number  of  network  flow  and  matching 
algorithms  [HK]. 

We  show  that  the  greedy  algorithm  for  finding  a  maximal  set  of  vertex  disjoint  paths 
in  a  layered  dag  is  P-complctc.  The  greedy  algorithm  repeatedly  finds  paths  from  the  first 
level  to  the  hist  level  and  removes  them.  This  process  is  repeated  until  the  first  level  is 
separated  from  the  last  level.  When  a  path  is  removed,  some  vertices  might  be  separated 
from  the  last  level,  these  vertices  are  also  removed.  When  the  algorithm  constructs  a 
path,  it  finds  the  Iexmin  path  between  the  first  and  hast  level  in  the  current  graph.  The 
complexity  of  the  problem  docs  not  arise  from  finding  Iexmin  paths,  since  they  can  be 
found  easily  in  a  dag.  The  complexity  arises  from  the  dependence  of  a  path  on  the  previous 
choices  of  paths. 


Theorem  2.3.  The  greedy  algorithm  for  computing  a  maximal  set  of  vertex  disjoint  paths 
in  a  layered  dag  is  P-complete. 

Proof:  The  proof  is  a  reduction  from  the  circuit  value  problem  with  several  minor  restric¬ 
tions.  First,  the  circuit  is  restricted  to  be  made  up  of  only  inputs,  NOT  gates,  and  AND 
gates.  The  fanout  of  all  gates  is  assumed  to  be  exactly  two,  although  one  of  the  outputs 
of  a  gate  need  not  be  connected  to  anything.  The  gates  are  numbered  topologically  so 
that  a  gate  gets  its  inputs  from  lower  numbered  gates.  The  outputs  of  gate  /?,  are  denoted 
t'l  and  12,  with  i*  going  to  the  higher  numbered  gate  receiving  an  input  from  /?,  and  *2 
going  to  the  lower  numbered  gate.  If  only  one  gate  gets  an  input  from  /?,,  then  that  gate 
receives  i-i-  It  is  finally  assumed  that  the  AND  gates  get  their  inputs  from  distinct  gates, 
so  (3k  —  AND (i i, *2)  is  not  allowed.  The  circuit  value  problem  clearly  remains  P-complcte 
with  these  restrictions. 

Let  /?  =  /?i,  ...,/?„  be  a  circuit  satisfying  the  above  conditions.  We  construct  a  layered 
dag  G  such  that  the  maximal  set  of  disjoint  paths  found  by  the  greedy  algorithm  with  input 
G  corresponds  to  the  evaluation  of  the  circuit  (3.  The  basic  structure  of  G  is: 

1  2  3  4  n 

•  •  •  •  •  •  •  • 

I  I  I  I  I 
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Gk  is  the  gadget  which  is  associated  with  the  gate  /?*.  The  circuit  is  simulated  by  comput¬ 
ing  a  sequence  or  paths  P| , . . . ,  P„.  The  path  l\  is  the  icxmin  path  in  the  graph  G  after 
the  paths  P\, . . . ,  P*.  _  1  have  been  removed.  The  gate  (3k  is  simulated  by  the  path  P^.  The 
path  Pk  goes  from  k  to  k'  and  lies  entirely  in  Gk  except  when  it  tests  the  inputs  to  /?*. 
There  arc  two  distinguished  vertices  ki  and  k^  in  Gk  that  arc  visited  by  /\.  if  / 3k  is  true 
and  are  not  visited  if  /?*.  is  false. 

The  gadgets  for  the  inputs  and  the  gates  arc  illustrated  in  the  figures  below.  The 
graph  has  3n  +  2  levels.  The  output  vertices  for  Gk  arc  numbered  Aq  and  k-z.  The  vertices 
fcj,  kz  and  are  on  levels  3 k  —  1,  6k,  and  3 k  +  1  respectively.  In  the  figures,  all  edges  are 
directed  downwards.  For  a  1-INPUT  and  a  0-INPUT,  the  path  Pk  goes  directly  from  k  to 
k' .  The  vertices  ki  and  k?  arc  visited  for  a  1-INPUT  and  arc  not  visited  for  a  0-INPUT. 

In  the  figure  for  the  NOT  gate,  /?*  -  NOT(t,,),  p  G  {1,2},  the  vertex  l  is  numbered 
higher  than  the  vertex  iv.  If  the  vertex  t,,  has  not  been  visited  when  Pk  is  constructed, 
the  path  P*  goes  through  iv  to  k  1  and  &2>  otherwise  the  path  goes  through  l  and  bypasses 
ki  and  /c2 .  Since  iv  not  being  visited  corresponds  to  receiving  a  false  value,  the  gadget 
simulates  a  NOT  gate.  I11  the  AND  gate  (3k  —  AND(t,„  jq),  p,q  G  {1,2}  the  path  P*  goes 
through  Aj  and  kz  if  both  iv  and  j,(  have  been  visited  by  earlier  paths.  If  iv  or  jq  has  not 
been  visited  when  1\  is  constructed,  then  the  path  docs  not  go  through  the  vertices  Aq 
and  kz-  Note  that  if  both  iv  and  jq  have  not  been  visited,  the  path  1\  goes  through  both 
iv  and 


1  -  INPUT 


0  -  INPUT 


In  the  NOT  gate  shown  below  it  is  essential  that  the  vertex  r,,.(  [  is  visited  by  some 
path  before  the  path  /\  is  constructed.  If  this  were  not  the  case,  then  the  path  P *  would 
go  out  through  t,,+  i  into  the  vertices  of  a  different  gadget,  not  returning  to  k i  and  k 2. 
This  also  applies  to  the  vertices  iv  t  |  and  j,t  t  (  in  the  AND  gate.  For  any  gate,  the  vertex 
is  on  the  path  f\,  so  there  is  no  problem  for  p  -  2.  If  p  -  1,  then  the  gate  is  the 
higher  numbered  gate  to  receive  an  input  from  /?,.  Suppose  flj  also  gets  an  input  from  /?,. 
The  path  P,  contains  the  vertex  ?'_>  if  i2  is  not.  on  Pt.  lienee  the  vertex  iv  f  j  is  already  on 
a  path  when  /*  is  constructed.  'Phis  completes  the  proof  that  the  circuit  is  successfully 
simulated  by  the  set  of  disjoint  paths  found  in  C  by  the  greedy  algorithm.  | 


2.5.  Neighborhood  Heuristics  for  NP-Complete  Problems 

Greedy  algorithms  are  often  used  to  approximate  NP-completc  problems.  The  basic 
approach  is  to  take  some  starting  solution  and  attempt  to  improve  it  by  local  changes.  The 
improvements  arc  repeated  until  a  local  maximum  is  reached.  Although  these  heuristics 
are  difficult  to  analyze,  they  have  been  found  effective  in  practice. 

The  greedy  algorithms  for  many  neighborhood  search  schemes  appear  to  be  inherently 
sequential.  It  seems  to  be  difficult  to  perform  more  than  a  few  steps  of  the  search  at  a 
time.  As  an  example  of  this  we  show  that  an  approximation  algorithm  for  the  maxcut 
problem  is  P-complctc.  The  maxcut  problem  is: 

Given  a  graph  G  —  (F,  P),  find  a  partition  °f  V  such  that  the  number 

of  edges  between  Pi  and  V2  is  maximized. 

The  heuristic  that  is  used  is  to  move  vertices  between  the  two  sets  as  long  as  moves  increase 
the  number  of  edges  between  the  two  sets.  First  some  initial  partition  is  chosen.  Then  a 
vertex  is  found  that  lias  more  neighbors  in  its  own  set  than  in  the  other  and  it  is  moved  to 
the  other  set.  This  step  is  repealed  until  each  vertex  has  more  neighbors  in  the  opposite 
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set  than  in  its  own  set.  Since  the  number  of  edges  between  Vy  and  V2  is  incrcjiscd  each 
move,  there  are  at  most  | !C\  phases. 

An  alternative  way  to  view  this  scheme  is  as  a  coloring  problem.  The  goal  is  to  color 
the  vertices  with  two  colors  such  that  each  vertex  has  more  neighbors  of  the  opposite 
color  than  of  its  own  color.  Such  a  coloring  is  said  to  be  stable.  A  stable  coloring  can  be 
computed  by  switching  the  colors  of  vertices  one  at  a  time  until  the  coloring  is  stable.  A 
valid  swap  occurs  when  a  vertex  with  more  neighbors  of  its  own  color  than  of  the  opposite 
has  its  color  changed.  We  show  that  given  a  graph  with  an  initial  assignment  of  colors, 
it  is  I'-complete  to  compute  a  sequence  of  valid  swaps  that  reaches  stability.  This  shows 
that  the  greedy  algorithm  for  approximating  maxcut  is  I’-complete.  For  this  problem,  we 
require  that  the  output  include  the  intermediate  computations  that  are  made.  We  do  this 
so  that  all  of  the  color  swaps  arc  valid.  Our  result  does  not  imply  that  it  is  P-complete 
to  compute  a  stable  coloring.  It  is  an  open  problem  whether  it  is  possible  to  compute  a 
stable  coloring  with  a  fast  parallel  algorithm.  An  SIC  algorithm  is  known  for  computing  a 
stable  coloring  of  a  graph  with  maximum  degree  three  [KSS]. 

Theorem  2.4.  Given  a  graph  C  (F,  E)  and  an  initial  assignment  of  colors  to  the 


vertices,  it  is  P-complele  to  compute  a  sequence  of  valid  swaps  that  reaches  a  stable 
coloring. 

Proof:  The  proof  is  a  reduction  from  the  monotone  circuit  value  problem.  Let  p  = 
he  a  monotone  circuit.  A  graph  G  is  constructed  such  that  a  sequence  of 
valid  swaps  that  reaches  stability  corresponds  to  the  evaluation  of  the  circuit.  There  is 
a  subgraph  for  each  gate  with  an  initial  assignment  of  the  colors  R/B.  The  OR  gate 
is  shown  below.  The  gates  are  connected  together  in  a  manner  that  corresponds  to  the 
connections  of  the  circuit.  The  subgraph  for  Pu  has  three  distinguished  vertices,  the  vertex 
k  is  associated  with  the  gate’s  inputs  and  the  vertices  k i  and  k2  are  associated  with  the 
gate’s  outputs.  If  /?*  gives  outputs  to  Pi  and  /?m  (/  <  m),  then  there  are  edges  (Aq,f) 
and  ( k2,m ).  In  the  illustration  of  the  OR  gate,  the  vertices  labelled  with  B  have  color  B 
and  arc  connected  to  two  vertices  with  color  R  (not  shown  in  the  figure).  These  vertices 
will  never  have  more  B  neighbors  than  R  neighbors,  so  they  are  colored  B  throughout  the 
simulation.  Similarly  the  vertices  labelled  R  have  color  R  and  are  connected  to  two  vertices 
with  color  B.  The  AND  gate  has  the  same  structure  as  the  OR  gate  except  that  the  two 
vertices  labelled  x  are  not  present.  A  0-INPUT  p *  is  a  vertex  k  with  color  R  connected  to 
two  vertices  with  color  B  and  a  1-INPUT  Pk  is  a  vertex  k  with  color  B  connected  to  two 
vertices  of  color  R.  If  the  input  0 *.  goes  to  the  gate  pi,  then  there  is  an  edge  ( k,l ). 

The  circuit  is  evaluated  by  finding  a  sequence  of  valid  swaps  that  reaches  a  stable  col¬ 
oring.  The  graph  G  has  the  property  that  the  coloring  achieved  by  any  maximal  sequence 
of  valid  swaps  is  unique.  The  graph  also  has  the  property  that  any  vertex  can  have  its  color 
changed  at  most  one  time.  The  initial  coloring  of  the  gate  /?*  is  stable,  except  possibly 
for  the  vertex  k.  If  the  vertex  k  is  recolored  R  then  other  vertices  become  unstable  and 
eventually  the  vertices  k i  and  k2  are  rccolorcd  B.  If  the  vertex  k  is  rccolorcd  R,  then  the 
gate  pk  evaluates  to  true.  When  the  vertex  k  is  colored  R,  the  swaps  propagate  so  that 
the  vertices  k\  and  k2  have  their  colors  changed  to  B.  For  the  OR  gate  /?*.,  if  one  of  the 
vertices  corresponding  to  its  inputs  is  rccolorcd,  then  the  vertex  k  is  rccolorcd.  Similarly, 
for  the  AND  gate  /?*,  if  both  of  its  inputs  arc  rccolorcd,  then  the  vertex  k  is  rccolorcd. 
A  maximal  sequence  of  swaps  simulates  the  evaluation  of  the  gates  in  roughly  topological 
order.  A  gate  is  not  set  to  true  (i.e.  the  colors  switched  in  the  .associated  subgraph),  until 
enough  of  its  inputs  arc  known  to  be  true  to  make  the  gate  true.  The  simulation  proceeds 
until  the  subgraphs  that  correspond  to  all  gates  that  evaluate  to  true  have  had  their  colors 
switched.  | 


2.6.  Additional  P-Complete  Algorithms 

Many  other  sequential  algorithms  can  be  shown  to  be  P-complctc.  Here  arc  a  few 
additional  P-complctc  algorithms.  All  of  our  P-complctcness  proofs  for  these  algorithms 
arc  reductions  from  variants  of  the  circuit  value  problem.  The  P-complctcness  proofs  for 
the  high  degree  subgraph  problem  and  first  Gt  bin  packing  arc  given  in  Chapter  4.  The 
other  proofs  arc  left  to  the  interested  reader. 
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High  Degree  Subgraph  Problem 

PROBLEM:  Given  a  graph  G  =  (F,  J5)  and  an  integer  fc,  construct  the  maximum  induced 
subgraph  that  has  all  vertices  of  degree  at  least  k. 

SEQUENTIAL  ALGORITHM:  The  sequential  algorithm  for  this  problem  discards  vertices  of 
degree  less  than  k  one  at  a  time  until  all  remaining  vertices  have  degree  at  least  k. 
Comment:  The  problem  of  determining  if  a  graph  has  a  nonempty  induced  subgraph  with 
minimum  degree  k  is  also  P-coinpletc.  Various  methods  of  finding  approximate  solutions 
to  the  high  degree  subgraph  problem  are  discussed  in  Chapter  4. 

First  Fit  Bin  Packing 

PROBLEM:  Given  a  finite  set  U  of  items  with  sizes  s(u)  €  Z+  for  each  u  G  U  and  a  bin 
capacity  B,  construct  a  first  fit  packing  of  the  items  into  the  bins. 

SEQUENTIAL  ALGORITHM:  The  sequential  algorithm  considers  the  items  in  the  order 
and  places  each  item  in  the  first  bin  with  enough  room  left  for  the  item. 
Comment:  This  problem  remains  P-complete  if  the  items  arc  in  decreasing  order,  but  can 
be  solved  in  M C  if  the  items  arc  in  increasing  order.  The  problem  is  discussed  further  in 
Chapter  4. 

Alternating  Breadth-first  Search 

PROBLEM:  Given  a  graph  G  ~  {V,E)  with  the  edges  partitioned  into  two  sets  M  and 
U,  and  a  distinguished  vertex  r  G  V  construct  an  alternating  breadth  first  search  from  r. 
An  alternating  breadth  first  search  is  a  partition  of  the  vertices  into  levels,  with  edges  in 
U  going  from  even  levels  to  odd  levels,  and  edges  from  M  going  from  odd  levels  to  even 
levels.  A  vertex  v  is  on  level  i  for  i  even  ( i  odd)  if  it  is  not  on  any  level  less  than  i  and 
there  is  a  vertex  w  on  level  i  —  1  with  (v,u>)  G  M  ((v,u/)  G  U).  The  vertex  r  is  on  level  0. 
SEQUENTIAL  ALGORITHM:  The  sequential  algorithm  assigns  the  vertices  to  levels,  one 
level  at  a  time. 
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Comment:  This  labelling  procedure  arises  in  some  sequential  matching  algorithms  when 
looking  for  augmenting  paths  [MV].  A  variant  of  the  algorithm  which  collapses  blossom* 
when  they  arc  encountered  is  also  P-complcte. 

Travelling  Salesman  2-Opting 

PROBLEM:  Given  a  graph  G  =  (V,  E)  with  edge  weights,  to(e)  G  for  rach  e  £  E  and 
an  initial  tour  7b,  find  a  sequence  of  tours  To, . . . ,  T„,  such  that  T,  is  the  result  of  a  S-opt 
[KL]  of  T,_i,  the  cost  of  7’,  is  less  than  the  cost  of  T,_  j  and  either  Tm  is  a  locally  optimal 
tour  or  m  >  |V|.  A  2- opt  refers  to  a  neighborhood  transformation  done  on  tours  of  the 
graph. 

SEQUENTIAL  ALGORITHM:  Transformations  arc  applied  one  at  a  time  until  a  local  opti¬ 
mum  is  reached. 

Comment:  It  is  necessary  to  put  a  bound  on  the  number  transformations,  since  examples 
are  known  where  an  exponential  number  of  transformations  may  be  made  before  a  local 
optimum  is  reached  [Lue]. 

2.7.  Discussion 

In  this  chapter,  wc  have  shown  that  a  number  greedy  algorithms  seem  to  be  inherently 
sequential.  However,  there  is  one  important  greedy  algorithm  which  lias  a  parallel  imple¬ 
mentation.  namely  Kruskal’s  minimum  spanning  tree  algorithm.  The  algorithm  constructs 
a  minimum  spanning  tree  by  considering  the  edges  in  order  of  their  weights.  It  maintains 
as  its  solution  set  a  collection  of  trees.  Any  edge  which  joins  separate  trees  is  added  to  the 
solution.  This  algorithm  can  be  converted  to  a  parallel  algorithm  by  considering  each  edge 
independently.  An  edge  is  in  the  minimum  spanning  tree  if  and  only  if  it  joins  distinct 
connected  components  of  the  edges  that  come  before  it.  IT  the  edges  arc  ordered  by  their 
edge  numbers,  then  this  algorithm  finds  the  lexmin  spanning  tree. 

The  reason  that  the  greedy  algorithm  for  the  minimum  spanning  tree  problem  can 
be  implemented  as  a  fast  parallel  algorithm  is  that  it  is  easy  to  know  if  an  edge  is  in  the 
solution  set  without  knowing  exactly  what  the  solution  set  is  when  the  edge  is  considered. 
The  minimum  spanning  tree  problem  is  a  special  case  of  the  maximum  independent  set 
problem  for  weighted  matroids.  As  long  us  the  malroid  has  a  rank  function  which  can 
be  computed  by  a  fast  parallel  algorithm,  then  the  associated  greedy  algorithm  can  be 
parallelized. 

There  arc  a  number  of  interesting  open  problems  concerning  greedy  algorithms.  One 
of  the  most  important  is  the  status  of  the  greedy  algorithm  for  computing  a  maximal 
matching.  This  algorithm  computes  the  lexmin  maximal  matching.  Although  this  prob¬ 
lem  bears  a  close  resemblance  to  the  lexmin  maximal  independent  set  problem,  it  is  not 
known  to  be  P-complctc.  A  P-compIetencss  proof  of  lexmin  maximal  matching  would  be 
significant  since  it  would  imply  that  weighted  matching  is  P-completc. 

In  this  chapter  wc  introduced  the  notion  of  a  P-completc  algorithm.  This  notion  pro¬ 
vides  a  means  to  identify  techniques  that  probably  will  not  work  for  a  particular  problem, 
and  to  direct  the  search  for  algorithms  in  more  promising  directions.  The  specific  results 
of  this  chapter  show  that  for  quite  a  few  problems,  a  greedy  approach  is  not  likely  to 


Chapter  3  Path  Problems 


3.1.  Introduction 

In  this  chapter  we  present  parallel  algorithms  for  a  number  of  simple  combinatorial 
problems.  The  problems  that  arc  examined  all  deal  with  finding  certain  paths  in  graphs. 
The  most  important  results  of  this  chapter  are  that  it  is  possible  to  find  a  maximal  path 
with  an  RS!C  algorithm  and  that  a  depth  first  search  tree  can  be  constructed  in  0(n1^2  f') 
time.  Some  of  the  results  in  this  chapter  are  complementary  to  the  results  of  the  previous 
chapter.  The  greedy  algorithms  for  several  problems  studied  in  this  chapter  are  P-complete, 
but  these  problems  can  be  solved  by  fast  parallel  algorithms  when  different  approaches  are 
taken. 

The  first  algorithm  that  we  give  is  a  simple  probabilistic  algorithm  for  finding  a  long 
path  in  a  dense  graph.  Given  a  graph  with  all  vertices  of  degree  at  least  m,  a  path  of 
length  m  —  o(m)  is  constructed  with  an  R.SIC  algorithm.  The  second  problem  we  look  at 
is  finding  a  maximal  set  of  disjoint  paths.  The  problem  is  given  a  graph  G  —  ( V ,  E )  and  a 
subset  U  of  the  vertices,  find  a  maximal  set  of  vertex  disjoint  paths  with  their  endpoints 
in  the  set  U.  We  show  that  this  problem  can  be  solved  in  SIC  for  graphs  with  bounded 
degree.  We  then  show  that  a  maximum  set  of  disjoint  paths  can  be  found  in  HSIC  using 
the  Karp-Upfal-Wigdcrson  matching  algorithm.  The  major  result  of  this  chapter  is  that 
the  maximal  path  problem  can  be  solved  in  kSIC.  The  maximal  path  problem  is: 

Given  a  graph  G  —  {V,E)  and  a  vertex  r,  find  a  simple  path  starting  from  r  that 

cannot  be  extended  without  encountering  a  vertex  that  is  already  on  the  path. 

We  also  show  that  the  restricted  case  of  the  maximal  path  problem  for  bounded  degree 
graphs  can  be  solved  in  SIC.  Our  final  result  is  that  a  depth  first  search  tree  of  an  n  vertex 
graph  can  be  constructed  in  parallel  time  0(n,'2+<). 

There  are  a  number  of  reasons  to  look  for  parallel  algorithms  for  problems  such  as 
these.  A  major  reason  is  to  gain  an  understanding  of  the  types  of  problems  that  are 
in  SIC  and  R.SIC.  Although  these  problems  are  fairly  simple  in  nature,  it  is  by  no  means 
obvious  that  they  can  be  solved  by  fast  parallel  algorithms.  These  problems  are  not  closely 
related  to  other  problems  known  to  be  in  SIC  or  R.SIC,  so  the  positive  results  increase  the 
variety  of  problems  that  can  be  solved  by  fast  parallel  algorithms.  A  second  reason  to  look 
at  particular  problems  is  to  identify  techniques  to  use  in  parallel  algorithms.  There  are 
relatively  few  general  techniques  used  in  parallel  algorithms,  so  it  is  hoped  that  by  looking 
at  new  problems,  .additional  techniques  can  be  discovered  and  added  to  the  repertoire. 

Some  of  the  problems  discussed  in  this  chapter  arc  important  problems  in  their  own 
right.  In  particular,  depth  first  search  is  one  of  the  major  open  problems  in  parallel 
computation.  The  algorithm  in  this  chapter  is  the  first  sublinear  algorithm  for  depth  first 
search.  Much  of  the  work  in  this  chapter  was  motivated  by  the  depth  first  search  problem. 
The  initial  reason  for  studying  the  maximal  path  problem  is  its  close  relationship  to  depth 
first  search. 


3.2.  Finding  a  Long  Path  in  a  Graph 


The  first  problem  that  we  look  at  is  the  problem  of  finding  a  long  path  in  a  dense 
graph.  Let  G  —  [V,E)  be  an  n-vertex  graph  with  all  vertices  of  degree  at  least  m.  The 
graph  clearly  has  a  path  of  length  at  least  m.  This  problem  can  be  solved  sequentially 
by  the  greedy  algorithm  discussed  in  the  previous  chapter,  since  any  maximal  path  in  G 
has  length  at  least  m.  However,  this  problem  is  a  little  trickier  to  solve  with  a  parallel 
algorithm.  One  plausible  approach  is  to  generate  a  random  walk  in  the  graph  and  take 
as  the  path  the  segment  of  the  walk  up  until  the  walk’s  first  intersection  with  itself.  This 
does  not  work  since  if  the  graph  is  a  complete  graph,  then  the  expected  length  of  the  path 
constructed  by  this  method  is  just  0(y/n). 

The  algorithm  that  we  give  for  this  problem  is  a  probabilistic  algorithm.  The  basic 
idea  is  to  construct  a  subgraph  in  which  a  long  path  can  be  found  easily.  The  subgraph  is 
constructed  by  making  random  choices  of  the  edges.  The  randomization  is  done  in  a  way 
where  the  choices  of  edges  are  not  fully  independent,  so  that  the  resulting  graph  has  a  long 
path  with  high  probability.  The  algorithm  finds  a  path  of  length  .at  least  where  c  is 

a  small  constant.  The  algorithm  runs  in  expected  time  O(logn)  using  0(n2)  processors. 
The  algorithm  can  be  run  several  times  to  construct  a  path  of  length  m  -  o(m)  in  JlMC. 
As  long  as  the  path  has  length  less  than  (1  —  j~-)m,  the  graph  formed  by  deleting  the 
path  has  all  vertices  of  degree  at  least  — so  a  path  can  be  found  of  length  77^71^-  Thus 

at  most  c  log 2  n  paths  need  to  be  found  to  construct  a  path  of  length  m  —  o(m). 

The  first  step  in  the  algorithm  is  to  randomly  label  the  vertices  of  the  graph.  Each 
vertex,  independently  and  uniformly  picks  a  label  from  the  set  {0, ...,d  —  1}  where  d  = 

rToK«J  ■  l'or  a  random  labelling,  it  is  very  likely  that  each  vertex  has  at  least  one  neighbor 

with  label  i  for  every  if  {0, . . .  ,rf  —  1}.  We  denote  the  set  of  labels  of  neighbors  of  v  by 
L(w).  A  labelling  with  L(vt)  -  {0, . . .  ,  d  —  1}  for  all  v,  G  V  is  referred  to  as  a  good  labelling. 

Lemma  3.1.  For  a  random  labelling,  L(v{)  —  {0, . . .  ,d-  1}  for  all  v,  G  V  with  probability 
at  least  1  -  1  . 

n 

Proof:  Let  c  —  3  In  2.  The  lemma  is  proved  by  bounding  the  probability  that  there  is 
some  vertex  which  is  not  adjacent  to  vertices  with  all  of  the  labels  in  {0, . . .  ,  d  —  1}. 

P[3v,  I  i(„0  f-  {0 . d  -  1}}  <  £)  *  {0 . <*  -  1}} 

<  £  £  m  t  £(«<)» 

v.ev  o<><d-i 

s£  £  (i-sr 

v.ev  o<}<d-i 

<  n2(!  _  iwclogn 


The  labelling  step  is  repeated  until  a  good  labelling  is  found.  Once  a  good  labelling 
is  found,  each  vertex  picks  one  of  its  neighbors  a(v)  with  a  label  one  greater  than  its 
label,  thus  if  vertex  v  has  label  k,  it  picks  one  of  its  neighbors  with  label  (k  +  l)  mod  d. 
An  auxiliary  graph  is  constructed  with  vertices  v  and  an  edge  from  each  vertex  v  to  the 
associated  vertex  a( t>).  A  typical  example  of  the  auxiliary  graph  is  illustrated  below.  Since 
this  graph  has  \V\  edges,  it  must  have  a  cycle.  On  the  cycle,  the  labels  increase  by  exactly 
one  going  from  a  vertex  to  its  neighbor,  so  the  graph  has  a  cycle  of  length  at  least  d.  This 
cycle  can  be  found  by  path  doubling  in  O(logn)  time. 


3.3.  Finding  a  Maximal  Set  of  Disjoint  Path9 

Another  path  problem  is  to  find  a  maximal  set  of  disjoint  paths.  The  problem  is: 
(liven  a  graph  C  -  (V,  I'J)  and  a  subset  U  of  the  vertices,  find  a  maximal  set 
P  -  {/-’j, . . . ,  /\)  of  vertex  disjoint  paths  that  join  vertices  of  U. 

This  means  that  no  more  paths  can  be  added  to  P  that  have  their  endpoints  in  U.  We 
require  that  the  paths  arc  non  trivial  (i.e.,  they  contain  at  least  two  vertices),  and  that 
vertices  of  U  only  appear  in  the  paths  of  P  as  endpoints. 

The  maximal  set  of  disjoint  paths  problem  is  a  generalization  of  maximal  matching. 
If  V  V ,  then  the  problem  is  to  find  a  maximal  matching  in  the  graph.  The  maximal 
set  of  disjoint  paths  problem  is  an  important  subroutine  for  a  number  of  sequential  and 
parallel  algorithms.  The  directed  variant  of  the  problem  is  a  key  step  in  the  llopcroft- 
Karp  bipartite  matching  algorithm  [UK].  We  use  an  algorithm  for  finding  a  maximal  set  of 
disjoint  paths  in  our  depth  first  search  algorithm.  We  present  two  algorithms  for  finding 
disjoint  paths.  The  first  algorithm  is  an  MC  algorithm  for  graphs  with  bounded  degree. 
The  second  algorithm  is  actually  for  a  more  difficult  problem;  for  finding  a  maximum  set 
of  disjoint  paths  instead  of  just  finding  a  maximal  set  of  disjoint  paths.  The  algorithm 
reduces  the  problem  to  matching,  so  it  can  be  solved  in  JZbiC  using  the  Karp-Upfal- 
Wigdcrson  matching  algorithm.  It  is  straightforward  to  generalize  our  second  algorithm 
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to  directed  graphs.  However,  the  bounded  degree  algorithm  applies  only  to  undirected 
graphs. 

When  the  maximum  degree  of  the  graph  is  bounded  by  d,  a  maximal  set  of  disjoint 
paths  can  be  found  in  0(dlog3  n)  time.  Thus  if  d  is  0(logfc  n),  the  problem  can  be  solved 
in  UC.  The  algorithm  uses  the  iterated  improvement  strategy  mentioned  in  Chapter  1. 
Each  phase  of  the  algorithm  finds  a  number  of  disjoint  paths  and  deletes  them  from  the 
graph.  The  number  of  vertices  in  U  is  reduced  by  a  factor  of  about  1  -  |  each  phase. 

A  phase  begins  by  finding  a  spanning  tree  of  the  graph.  Then  a  maximal  set  of  disjoint 
paths  is  found  in  the  spanning  tree.  The  paths  are  then  deleted  and  another  phase  is  run. 
If  the  graph  becomes  disconnected,  the  separate  components  are  considered  independently. 

A  maximal  set  of  disjoint  paths  in  the  tree  is  constructed  by  following  paths  from  the 
vertices  of  U  up  towards  the  root.  Whenever  two  or  more  paths  intersect,  two  of  the  paths 
are  joined.  For  example,  in  the  figure  below,  paths  would  be  found  between  the  pairs  of 
vertices  (1x1,113),  (04,1*5),  and  (07,03)-  The  vertices  02,  and  uG  would  be  left  for  the  next 
phase. 


Uj  uG  u8 


It  is  not  difficult  to  find  the  paths  quickly  in  parallel.  One  method  is  to  assign  values 
to  the  edges  of  the  tree:  1  indicates  it  is  a  path  edge,  and  0  indicates  it  is  not  a  path  edge. 
If  a  vertex  is  not  in  (f,  then  the  edge  coming  out  of  it  has  value  1  if  exactly  one  of  the 
edges  coming  into  it  has  value  1  and  the  value  is  0  otherwise.  For  a  vertex  in  U,  the  value 
of  its  outgoing  edge  is  1  if  all  of  the  edges  coming  into  it  have  value  0  and  the  value  is  0 
otherwise.  The  values  of  all  the  edges  can  be  computed  by  treating  the  tree  as  a  type  of 
circuit  and  using  the  standard  technique  for  evaluating  circuits  with  fanout  one. 

Whenever  some  paths  are  joined  at  a  vertex,  some  other  paths  might  be  cut  ofF. 
The  tree  can  be  partitioned  into  connected  components  that  consist  of  the  edges  that  .are 
assigned  the  value  1.  There  is  at  most  one  component  that  contains  exactly  one  vertex  of 
U.  For  the  other  components,  the  worst  case  is  if  a  vertex  is  in  U  and  all  of  its  incoming 
edges  represent  paths.  Since  the  maximum  degree  is  assumed  to  be  <7,  this  vertex  accounts 
for  two  vertices  in  U  being  joined,  and  d  -  I  being  left  unjoined.  Thus,  the  number  of 
vertices  in  U  that  arc  joined  is  at  least  (|(/|  -  1).  The  number  of  phases  of  the  algorithm 

is  bounded  by  0(dlog|f/|).  Each  phase  takes  O(log2n)  time. 


3.4.  Finding  a  Maximum  Set  of  Disjoint  Paths 


We  now  turn  our  attention  to  finding  a  maximum  set  of  disjoint  paths  instead  of  just 
a  maximal  set  of  disjoint  paths.  The  maximum  set  of  disjoint  paths  problem  (MDP)  is: 

Given  a  graph  G  —  (V,E)  and  a  set  of  vertices  t/,  find  a  maximum  cardinality 

set  of  nontrivial  vertex  disjoint  paths  that  have  their  endpoints  in  U . 

This  problem  is  a  much  more  difficult  problem  than  finding  a  maximal  set  of  disjoint  paths. 
If  U  —  V,  then  the  problem  is  to  find  a  maximum  matching.  Finding  disjoint  paths  is  the 
central  step  in  our  maximal  path  algorithm.  We  show  that  MDP  is  in  R.MC  by  reducing 
it  to  matching. 

If  the  problem  were  to  find  a  maximum  set  of  disjoint  paths  from  a  set  Ui  to  a  set 
U 2,  it  could  be  expressed  as  a  flow  problem  with  unit  capacities  and  then  it  could  be 
reduced  to  bipartite  matching  [ET ] .  However,  since  matching  is  a  special  case  of  MDP,  the 
reduction  is  a  little  more  difficult.  Instead  of  reducing  MDP  to  a  flow  problem,  we  reduce 
it  to  a  bidirectional  flow  problem  [PS  ex.  8.6j  [Law]  and  then  reduce  the  bidirectional  flow 
problem  to  a  matching  problem.  A  bidirccted  graph  is  a  set  of  vertices,  a  set  of  directed 
edges,  and  a  set  of  bidirectcd  edges.  A  directed  edge  a  -~»  b  can  carry  a  unit  of  flow  from 
a  to  b.  A  bidirectcd  edge  a  *-»  b  can  either  give  a  unit  of  flow  to  both  a  and  6,  or  give  no 
flow  to  either.  A  bidirected  edge  can  be  thought  of  as  a  special  source  that  must  give  the 
same  amount  of  flow  to  both  of  its  neighbors.  The  flow  problem  is  to  determine  how  much 
flow  can  be  delivered  to  a  sink  vertex  t. 

Lemma  3.2.  MDP  can  be  reduced  in  log-space  (O(logn)  parallel  time)  to  a  unit  capacity 
bidirectional  flow  problem. 

Proof:  We  transform  MDP  into  a  bidirectional  flow  problem,  where  the  flow  corresponds 
to  a  set  of  disjoint  paths.  Each  vertex  v  is  replaced  by  a  pair  of  vertices  u„,  and  vout, 
with  a  directed  edge  vtn  ->  voul  between  them.  For  an  edge  (v,w)  in  the  graph  there  are 
directed  edges  v„ut  — >  win  and  w(nit  —*  v in  and  a  bidirectcd  edge  u,n  <->  wtn  as  shown: 


There  is  a  sink  t,  and  for  each  vertex  u^t  corresponding  to  u  €  U,  there  is  an  edge 
uout  —*  L  The  problem  is  to  find  a  maximum  flow  to  t.  The  bidirectional  edges  serve  as 
the  sources  of  the  flow. 

There  is  a  direct  correspondence  between  a  0-1  flow  of  2k  in  the  flow  graph  and  a 
set  of  k  disjoint  paths  in  the  original  graph.  Suppose  the  bidirectcd  graph  has  a  flow 
of  2k.  The  flow  is  introduced  on  k  bidirectcd  edges.  The  flow  introduced  at  v,n  ♦-*  v'n 
follows  paths  vinv„utViyin---uJtinu}  OUtt  and  v'tnv'oulv\in  •  •  •  to  the  sink.  This 

corresponds  to  the  path  uy  •  •  •  vivv'v[  •  •  •  u,  in  the  graph.  Similarly,  suppose  we  have  a  set 
of  k  disjoint  paths  in  the  graph  with  their  endpoints  in  U.  For  each  path  we  pick  an  edge 
(v,t/)  and  introduce  flow  on  the  bidirectcd  edge  v,n  «-»  v'n.  The  flow  then  follows  the  two 
segments  of  the  path  to  the  sink.  1 


Lemma  3.3.  Unit  capacity  bidirectional  flow  can  be  reduced  in  log-space  to  watching. 

Proof:  Wc  use  a  reduction  that  is  similar  to  the  standard  reduction  of  unit  capacity  flow 
to  bipartite  matching  [CSV]  [Wag].  The  handling  of  bidirected  edges  causes  the  reduction 
to  be  to  general  matching  instead  of  bipartite  matching.  A  graph  is  constructed  that  has 
a  perfect  matching  if  and  only  if  the  bidirected  graph  has  a  flow  of  size  2k.  The  maximum 
flow  is  found  by  constructing  graphs  for  each  possible  value  of  k.  The  flow  in  the  network 
can  be  reconstructed  from  a  perfect  matching. 

The  table  below  gives  the  graph  to  test  for  a  flow  of  2k  in  the  bidirected  graph  as 
follows.  The  outdegree  of  a  vertex  v  is  denoted  by  out(v). 

Bidirected  Graph  Matching  Graph 

Sink  t  Vertices  ti, . . . ,  t^k 

Vertex  *  Vertices  tj, . . . ,  tout/,\ 

Edge  t  —*  j  Vertex  tj 

Edges  (Tj  . . . ,  {Tj  ,  W(i)) 

Edge  t  -♦  t  Vertex  it 

Edges  ( it , 1 1), . . . ,  ( it  ,  t'out(s)) 

(  *t  ,  1 1),  .  .  .  ,  (  it  ,  t2k) 

Bidirected  Edge  i  *-*  j  Vertices  tji,  1J2 

Edges  (tjj,tj^) 

(*h»*i)>  •  •  • » (ui.b.utf.)) 

(*j2  ,;'i (05.  Jout(j)) 

There  is  a  direct  correspondence  between  a  flow  of  2k  and  a  perfect  matching.  In  a 
perfect  matching,  if  tj  is  matched  with  jv  there  is  flow  from  i  -»  j,  and  if  t  j  is  matched 
with  tp  there  is  no  flow  on  i  — >  j.  For  a  bidirected  edge  i  *-*  j,  if  tj\  is  matched  with  tp 
and  1J2  is  matched  with  j,,,  then  i  *-*  j  delivers  flow  to  both  i  and  j,  if  tj\  is  matched  with 
1J2,  then  i  j  docs  not  deliver  any  flow. 

We  now  prove  using  the  above  correspondence  that  the  graph  has  a  perfect  matching 
if  and  only  if  there  is  a  flow  of  2k. 

Suppose  the  graph  has  a  perfect  matching.  The  vertices  iv  are  matched  for  1  <P< 
out(i).  Suppose  a  of  the  vertices  ilt  arc  matched  to  vertices  of  the  form  vi  and  the  other 
out(i)  —  a  vertices  iv  are  matched  to  vertices  tv .  There  is  a  flow  of  a  going  into  vertex  t. 
Since  there  are  out(i )  vertices  of  the  form  tv  ,  a  of  the  vertices  tv  are  matched  to  vertices 
Vq ,  so  there  is  a  flow  of  a  out  of  t.  Hence,  flow  is  conserved  at  the  vertices,  so  it  is  a  valid 

flow.  Since  the  vertices  ti, . . . ,  ^2fc  ‘ire  all  matched  with  vertices  of  the  form  vt ,  the  flow  is 
of  size  2k. 

For  the  other  direction  of  the  proof,  .assume  there  is  a  flow  of  size  2k.  Suppose  the 
flow  into  t  is  a.  Then  a  vertices  vt  and  out(i)  -  u  vertices  Tv  can  be  matched  with  vertices 


ip  corresponding  to  i.  So  all  of  the  vertices  associated  with  the  vertices  in  the  original 
graph  can  be  matched.  The  vertices  iji  and  can  be  matched  together  if  flow  is  not 
introduced  on  i  <-»  j.  Since  the  flow  is  of  size  2k,  all  the  vertices  <i, . . .  ,£2*  are  matched. 

I 

An  example  of  the  reduction  from  bidirectional  flow  to  matching  is  shown  in  the 
following  diagram.  The  matching  illustrated  with  bold  edges  corresponds  to  a  bidirectional 
flow  introduced  on  the  edge  b  «-►  c  that  flows  to  t  along  the  paths  c  — >  /  — ►  t  and  6  — ►  a  — *  t. 


ba  be  1  bc2  cf 
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Bidirccted  Graph  Matching  Graph 


Combining  the  two  previous  lemmas  and  using  the  Karp-Upfal-Wigdcrson  probabilis¬ 
tic  matching  algorithm  [KUWJ,  we  have  the  following  theorem: 

Theorem  3.1.  The  maximum  set  of  disjoint  paths  problem  can  be  solved  by  an  R.MC 
algorithm.  | 


3.5.  The  Maximal  Path  Problem 

In  this  section  we  present  two  algorithms  for  the  maximal  path  problem.  The  maximal 
path  problem  is: 

Given  a  graph  G  —  (V,  E)  and  a  vertex  r  £  V ,  find  a  simple  path  P  starting  at 
r  such  that  P  cannot  be  extended  without  encountering  a  vertex  that  is  already 
on  the  path. 

Our  results  are  that  the  restricted  case  of  the  maximal  path  problem  for  bounded  degree 
graphs  can  be  solved  in  MC  and  the  general  case  can  be  solved  in  R.MC. 

The  maximal  path  problem  can  be  solved  sequentially  by  the  simple  greedy  algorithm 
discussed  in  Chapter  2.  We  showed  that  the  greedy  algorithm  for  constructing  a  maximal 


path  is  P-coinplete,  so  it  probably  cannot  be  sped  tip  substantially  witli  parallelism.  Here 
we  show  that  by  taking  a  different  approach,  the  problem  can  be  solved  by  a  fast  parallel 
algorithm. 

It  is  a  significant  result  that  the  maximal  path  problem  can  be  solved  by  a  fast  parallel 
algorithm.  The  construction  of  a  maximal  path  appears  to  be  a  very  sequential  process, 
since  to  add  a  vertex  to  the  path  wo  need  to  know  that  the  vertex  is  not  already  on  the 
path.  'Plie  initial  motivation  for  looking  at  the  maximal  path  problem  is  its  relationship  to 
depth  first  search.  Any  branch  of  a  depth  first  search  tree  a  maximal  path.  The  maximal 
path  problem  captures  some  of  the  difficulties  involved  with  a  parallel  depth  first  search. 
In  the  next  section  we  describe  an  0(n1>,2+‘)  parallel  algorithm  for  depth  first  search.  The 
algorithm  works  along  the  same  lines  as  the  maximal  path  algorithm  and  uses  a  number  of 
tools  developed  for  it.  However,  our  depth  first  search  algorithm  does  not  depend  directly 
upon  the  maximal  path  algorithm. 

Both  of  our  maximal  path  algorithms  use  a  divide  and  conquer  strategy.  A  path  is 
found  which  allows  the  problem  to  be  reduced  to  finding  a  maximal  path  in  a  graph  of  less 
than  half  the  original  size.  This  path  is  referred  to  as  a  splitting  path.  If  the  graph  has 
a  vertex  of  low  degree,  then  a  splitting  path  can  be  found  relatively  easily.  However,  the 
general  case  is  substantially  more  complicated.  To  find  a  splitting  path  in  a  general  graph, 
we  use  the  probabilistic  matching  algorithm  of  Karp-Upfal- Wigdcrson.  We  first  describe 
the  algorithm  to  find  a  maximal  path  in  a  graph  with  all  vertices  of  degree  less  than  d, 
and  then  describe  the  algorithm  for  general  graphs. 


3.5.1.  Maximal  Path  Algorithm  for  Bounded  Degree  Graphs 

bet  C  ( V ,  1C)  be  a  graph  with  all  vertices  having  degree  at  most  d.  We  give  a 
deterministic  algorithm  to  find  a  maximal  path  in  time  0{d\ogk  n).  This  gives  an  MC 
algorithm  for  families  of  graphs  with  a  degree  bound  of  0(log;  n). 

The  basic  step  of  the  algorithm  is  to  find  a  path  that  reduces  the  problem  to  Gnding 
a  maximal  path  in  a  smaller  graph. 

Definition  3.1.  A  path  P  starting  from  r  is  called  a  splitting  path  ifV-  P  has  at  least 
two  connected  components. 

Lemma  3.4.  If  splitting  paths  can  be  found  in  time  0(T(n)),  then  a  maximal  path  can 
be  found  in  time  0{T(n)  logn). 

Proof:  Suppose  P  —  rui  •  •  •  is  a  splitting  path.  Let  Uj  be  the  last  vertex  on  P  such 
that  uj  is  adjacent  to  at  least  two  components  of  V  -  ruj  •••u7.  Let  C  be  the  smallest 
of  the  components  of  V  -  rtt(  •  •  •  u}  adjacent  to  .and  let  v  be  a  vertex  in  C  that  is 
adjacent  to  Uj.  If  P'  is  a  maximal  path  from  v  in  C,  then  ruj  •  •  •  u,P'  is  a  maximal  path 
from  r  in  V ,  so  the  problem  is  reduced  to  finding  a  maximal  path  in  C.  Since  |(7|  <  it 
takes  at  most  logn  iterations  of  finding  a  splitting  path  and  reducing  (he  problem  to  find 
a  maximal  path.  Hence  the  maximal  path  problem  can  be  solved  in  f)(7'(n)logn)  lime. 
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We  now  describe  how  to  find  a  splitting  path.  We  first  take  care  of  the  case  where 
the  graph  is  not  biconnected.  If  r  has  degree  one,  then  we  can  follow  a  path  from  r  until 
we  reach  a  vertex  of  degree  at  least  three,  so  we  may  as  well  assume  that  r  has  degree  at 
least  two.  If  the  graph  is  not  biconnected,  there  is  an  articulation  point  v  that  is  in  the 
same  biconnected  component  as  r.  Any  shortest  path  from  r  to  v  is  a  splitting  path. 

The  more  interesting  case  for  finding  a  splitting  path  is  if  the  graph  is  biconnected. 
For  the  remainder  of  the  section  we  assume  that  G  is  biconnected.  The  basic  idea  in 
finding  a  splitting  path  is  to  pick  a  vertex  v  different  from  r  and  find  a  path  that  cannot 
be  extended  to  contain  any  more  neighbors  of  v.  This  cither  gives  us  a  splitting  path  or  a 
way  to  construct  a  maximal  path  directly.  Let  v  be  a  vertex  different  from  r.  We  construct 
a  path,  one  segment  at  a  time,  with  each  segment  going  to  another  neighbor  of  v  without 
passing  through  v.  We  stop  when  we  cannot  add  another  neighbor  of  to  the  path.  This 
is  done  by  the  following  simple  algorithm. 

Vis  it  N eighbors[r,  v) 

begin 

Let  uj, .  . .  ,v,„  be  the  neighbors  of  u; 

P  *—  r\ 

w  *—  r; 

V'  -  V  -  v; 

for  i  *-  1  to  m  do 

if  there  is  a  path  xuP'v,  with  wP'v,  C  V'  then 
begin 

P  -  PP'tv, 
w  t\; 

V  ♦  V  -  wP'- 

end 

return  P; 

end. 

Aflcr  we  call  Visit  Neighbors,  ,wc  add  v  to  P.  There  arc  three  cases  that  can  occur. 
First,  if  P  contains  all  of  u’s  neighbors,  then  Pv  is  a  maximal  path  and  we  arc  done. 
Otherwise  if  V  -  Pv  has  more  than  one  connected  component,  then  we  have  a  splitting 
path.  The  last  case  is  that  all  the  unvisiled  vertices  arc  in  a  single  component  and  some 
neighbor  of  v  is  not  on  P.  This  case  is  handled  by  the  following  lemma. 

Lemma  3.5.  Let  P  —  ru  i  •  •  •  be  a  path  such  that  V  —  P  has  a  single  connected 
component  C .  If  v  is  adjacent  to  C  but  uj-  is  not  adjacent  to  C,  then  a  maximal  path  can 
be  found  by  an  MC  algorithm. 

Proof:  Let  be  the  last  vertex  other  than  v  on  P  that  is  adjacent  to  C .  There  must  be 
such  a  vertex  u}  since  we  arc  assuming  the  graph  is  biconnected.  Let  zj  •  •  •  xj  be  a  path 
in  C  with  Uj  adjacent  to  x\  and  v  adjacent  to  x/.  The  path  r  •  •  ■  UjX\  ■  •  •  i/im*.  •  ■  •  Uj ^  is 
a  maximal  path.  'The  path  is  shown  in  the  figure  below.  | 

The  individual  steps  such  as  finding  a  path  between  two  vertices,  testing  for  articu¬ 
lation  points,  and  finding  connected  components  can  all  be  done  in  0(log2(n))  time  on 
0(n2)  processors.  Since  the  problem  size  is  reduced  by  at  least  half  every  time  a  path  is 


found  that  splits  the  graph,  no  more  than  log  n  stages  are  needed.  If  all  vertices  have  de¬ 
gree  at  most  £)(n),  no  more  than  D[n)  paths  arc  found  by  Visit  Neighbors.  The  algorithm 
therefore  runs  in  0(U(n)log3(n ))  time  on  0(n2)  processors.  The  restriction  that  all  of  the 
vertices  obey  a  global  degree  bound  is  not  necessary  for  this  algorithm  to  be  a  fast  parallel 
algorithm.  It  is  only  necessary  that  at  each  stage  a  vertex  of  low  degree  can  be  found.  For 
example,  if  the  graph  is  planar,  it  is  always  possible  to  find  a  vertex  of  degree  at  most  five, 
so  this  algorithm  runs  in  0(log3(n))  for  planar  graphs. 

3.5.2.  Maximal  Path  Algorithm  for  General  Graphs 

We  now  describe  an  algorithm  for  the  maximal  path  problem  in  general  graphs.  This 
algorithm  also  relics  on  finding  a  splitting  path  and  reducing  the  problem  to  a  problem  of 
less  than  half  the  size.  The  problem  of  finding  a  splitting  path  is  much  more  involved  than 
in  the  previous  algorithm.  The  discussion  of  finding  a  splitting  path  is  divided  into  two 
parts.  A  set  Q  of  vertex  disjoint  paths  is  said  to  separate  the  graph  if  V'  Q  has  at  least  two 
connected  components.  We  first  show  how  we  can  construct  a  splitting  path  from  a  set  of 
paths  that  separat  es  the  graph.  The  construction  is  similar  to  the  one  for  finding  a  splitting 
path  in  a  sparse  graph  discussed  above.  We  then  describe  how  the  separating  paths  arc 
found.  This  is  the  most  complicated  part  of  the  algorithm  and  relies  on  some  very  powerful 
machinery  The  algorithm  uses  as  a  subroutine  the  algorithm  for  finding  a  maximum  set 
of  disjoint  paths  described  in  Section  3.1.  'flic  resulting  algorithm  is  probabilistic,  but  its 
only  use  of  randomness  is  in  the  matching  subroutine  in  the  algorithm  for  finding  disjoint 
paths. 

We  now  describe  how  a  splitting  path  is  found.  Once  the  splitting  path  is  found, 
the  algorithm  proceeds  as  the  one  discussed  above.  We  also  assume  that  the  graph  is 
biconnccted.  A  splitting  path  can  be  found  in  a  graph  that  is  not  biconnected  as  is  done 
in  the  bounded  degree  case. 

We  show  how  to  construct  a  splitting  path  from  a  small  set  of  vertex  disjoint  paths 
that  separates  the  graph.  Suppose  Q  -  {Qi,...,Q&-}  >s  <l  sc'  of  vertex  disjoint  paths.  A 
subroutine  that  builds  a  single  path  using  the  paths  in  Q  is  used  in  the  construction  of 
a  splitting  path.  The  routine  Extendi’ ath(r,Q,V )  constructs  a  path  starting  from  r  that 
cannot  be  extended  to  include  any  more  vertices  of  Q.  In  other  words,  if  the  constructed 
path  is  P  —  riti  ••  •  it*,  then  no  vertex  lying  on  any  path  in  Q  is  contained  in  any  connected 
component  of  V  P  adjacent  to  uk.  Such  a  path  is  said  to  be  maximal  with  respect  to 
Q.  Extend}' nth  is  essentially  a  sequential  greedy  algorithm.  It  builds  the  path  by  adding 
segments  of  the  paths  in  Q  to  the  current  path  for  as  long  as  possible.  The  only  use  of 
parallelism  is  in  the  low-level  routines  which  find  shortest  paths  and  maintain  connected 
components.  The  routine  is: 


ExtendPath(r,  Q,V) 
begin 

P  <—  0;  a  <—  r; 

while  there  is  a  path  in  V  —  P  from  s  to  a  vertex  in  Q  do 

begin 

Let  sui  •  •  ‘Uk  be  the  shortest  path  in  V  —  P  from  s  to  Q\ 

Suppose  Uk  €  Qi,  and  Qi  =  viq'ukq"v2  with  <  |«?" V2 1; 

P  Psu  1  •  ■■Ukq"-, 

Qx  viq'\ 
s  <-  ua; 
end 

return  Ps; 
end. 

Each  iteration  of  the  while  loop  halves  the  length  of  some  path  in  Q,  so  if  there  are 
initially  c  paths,  there  can  be  at  most  clogn  iterations.  Each  of  the  steps  within  the  while 
loop  (such  as  finding  shortest  paths)  can  be  done  in  0(log2  n)  time,  so  the  total  time  is 
0(c  log3  n). 

To  show  how  to  construct  a  splitting  path  from  a  set  of  paths  that  separates  the 
graph,  we  begin  with  the  special  case  where  we  have  a  single  path  which  contains  all  the 
neighbors  of  a  vertex. 

Lemma  3.6.  Let  Qt  be  a  path  and  v  a  vertex  different  from  r  that  has  all  of  its  neighbors 
on  Q i-  There  is  an  MC  algorithm  that  finds  cither  a  splitting  path  or  a  maximal  path. 

Proof:  If  v  lies  on  Q i,  let  Q  --  {Q',Q"}  be  the  two  segments  formed  by  removing  v  from 
Q i,  otherwise  let  Q  =  {Qi}-  Construct  a  path  P  —  ruj  ■  ■  ■  uk  that  is  maximal  with  respect 
to  Q  in  V  -  v  by  calling  Extend Palh[r,Q,V  —  u).  There  arc  Tour  cases  to  consider. 

1)  Suppose  uk  is  not  adjacent  to  v.  If  there  was  a  path  from  uk  to  v  that  contained 
no  vertices  of  P  other  than  u*.  then  P  could  be  extended  to  contain  an  additional 
neighbor  of  v  without  including  v.  Hence  the  component  of  V  P  that  contains  v  is 
not  adjacent  to  uk,  so  P  is  either  maximal,  or  V  -  P  has  more  than  one  component. 

Assume  that  uk  is  a  neighbor  of  v  and  let  P'  —  Pv. 

2)  If  all  neighbors  of  v  are  on  P,  then  P'  is  a  maximal  path. 

3)  If  V  -  P'  has  more  than  one  component  then  P'  is  a  splitting  path. 

4)  The  final  case  is  when  V  ~  P'  has  a  single  connected  component  C  and  v  is  adjacent 

to  a  vertex  in  C .  This  is  precisely  the  case  that  is  covered  in  Lemma  3.5,  so  a  maximal 
path  can  be  constructed.  | 

Theorem  3.2.  Let  Q  -  {Q i,  •  ■  •  ,Qk}  be  a  set  of  vertex  disjoint  paths  where  k  <  c  for 
some  fixed  constant  c.  If  V  —  Q  has  more  than  one  connected  component  then  a  splitting 
path  or  a  maximal  path  can  be  found  by  an  NC  algorithm. 

Proof:  Use  ExlendPath  to  construct  a  path  P  that  is  maximal  with  respect  to  Q.  Suppose 
that  P  is  not  a  splitting  path,  so  that  V  —  P  has  a  single  connected  component  C.  If  Q  %  P, 
then  C  is  not  adjacent  to  the  last  vertex  of  P,  so  P  must  be  a  maximal  path.  Assume  that 
Q  C  P  and  let  i|  and  12  be  vertices  in  different  components  of  V  —  Q.  If  both  and 
X2  had  neighbors  in  C,  then  there  would  be  a  path  from  xj  to  12  with  all  of  its  interior 


vertices  in  C ,  but  this  would  mean  that  there  was  a  vertex  of  Q  in  C.  Hence,  either  xj  or 
X2  has  all  its  neighbors  on  P.  The  previous  lemma  then  shows  that  a  splitting  path  or  a 
maximal  path  can  be  constructed’.  i 

We  now  show  how  to  construct  a  small  set  of  paths  that  separates  the  graph.  The 
paths  that  wc  construct  are  referred  to  as  isolated. 

Definition  3.2.  A  set  of  vertex  disjoint  paths  Q  =  (Qi,...  ,Qk}  ,s  isolated  if  every  path 
between  endpoints  of  different  paths  has  at  least  one  interior  vertex  in  Q. 

A  set  of  at  least  two  isolated  paths  can  be  transformed  into  a  set  of  paths  that  separates 
the  graph.  Let  Q  —  {Qi,  • . .  ,Qk}  be  a  set  of  isolated  paths,  and  let  xj  be  an  endpoint  of 
the  path  Q\  and  X2  be  an  endpoint  of  Q2-  The  paths  Q'  —  {<2i  —  Xjt,Q2  —  X2>Qi,  •  •  •  >Qk} 
separate  the  graph  with  Xi  and  X2  in  different  components  of  V  -  Q' . 

A  set  of  isolated  paths  can  be  constructed  from  a  set  of  paths  by  repeatedly  combining 
paths.  A  maximum  set  of  disjoint  paths  is  found  between  the  endpoints  of  the  paths.  If 
a  path  P  is  found  between  endpoints  of  Qx  and  Qj,  i  ^  j,  then  the  paths  Qj,  P  and  Qj 
arc  joined  to  form  a  single  path.  Phases  of  joining  paths  are  repeal  ed  until  no  more  paths 
can  be  joined.  The  routine  JoinPaths  constructs  a  set  of  isol.ated  paths.  The  input  to 
JoinPaths  is  a  set  of  vertex  disjoint  paths  Q  =  {Ql,  . . .  ,Qk}  .and  a  set  of  vertices  T  not 
on  the  paths.  JoinPaths  constructs  a  set  of  isolated  paths  Q'  —  {Q\, . . .  ,  Q'  }.  Every  path 
in  Q  is  a  segment  of  some  path  in  Q' . 

JoinPaths[Q,  T) 

begin 

while  the  paths  in  Q  .are  not  isolated  do 
begin 

Construct  an  auxiliary  graph  G'  with  vertices  x,  for  each  Q,  Q  Q,  and  vertices  v,  for 
each  v,  €  T.  The  edge  (v,, vj)  is  in  G'  if  {vltVj)  is  an  edge  in  the  original  graph; 
[xi,Vj)  is  in  G  if  there  is  an  edge  from  an  endpoint  of  Qi  to  vj,  and  (x,,  xj)  is  in 
G  if  there  is  an  edge  between  endpoints  of  Qi  and  Qj,  i  ■/  j\ 

Find  a  maximum  set  of  disjoint  paths  P  —  {Pi,  ■  ■  ■ ,  Pj}  in  G'  with  their  endpoints  in 
{xx,. . 

for  each  Pi  E  P  do  in  parallel 
begin 

Suppose  P;  =  x,P/xj  with  i  <  j; 

Qi  <—  Q i  Pi  Q j i  (joined  appropriately) 

Qi  -  0; 

T  +-  T  -  P/; 

end 

end 

end. 

Earlier  in  this  chapter  we  showed  how  a  maximum  set.  of  disjoint  paths  can  be  com¬ 
puted  in  RM C  by  using  matching.  The  manipulation  of  paths  and  the  construction  of  the 
auxiliary  graph  can  be  done  easily  in  JfC.  'lo  show  that  JoinPaths  is  an  RNC  algorithm, 
we  show  that  the  number  of  phases  (i.c.  iterations  of  the  outer  loop),  is  O(logn). 


Lemma  3.7.  The  process  of  joining  the  paths  {Qi,. . .  ,  Qm}  requires  O(logm)  phases. 

Proof:  Suppose  k  joins  are  performed  at  phase  j.  Any  subsequent  join  must  involve 
at  least  one  unjoined  endpoint  of  a  path  joined  at  phase  j.  Thus  there  are  at  most  2k 
subsequent  joins.  The  number  of  paths  that  are  joined  at  a  phase  is  no  greater  than  the 
number  joined  the  previous  phase.  Therefore,  there  are  at  most  |  joins  at  phase  j  +  \. 
Because  there  are  at  most  ~  joins  during  the  first  phase,  there  are  O(logm)  phases. 

In  the  maximal  path  algorithm,  it  is  important  to  insure  that  the  joining  process  docs 
not  result  in  a  single  path.  The  following  lemma  gives  a  simple  case  where  the  joining 
process  does  not  form  a  single  path. 

Lemma  3.8.  If  at  most  y  —  1  joins  are  performed  in  the  first  phase  of  the  joining  process, 
then  there  arc  at  least  two  paths  left  when  the  joining  process  is  done. 

Proof:  The  total  number  of  joins  performed  is  at  most  y  —  1  -f  2(y  -  1)  =  m  —  3.  The 
number  of  joins  required  to  combine  the  paths  into  a  single  path  is  m  —  1.  I 

The  set  of  paths  joined  in  a  single  phase  depends  upon  the  particular  set  of  disjoint 
paths  found  by  the  probabilistic  matching  algorithm.  The  number  of  paths  joined  in  a 
phase  is  however  a  fixed  number,  independent  of  the  probabilistic  choices  made.  We  denote 
the  number  of  paths  joined  in  the  first  phase  of  JoinPaihs  by  J(Q\ , . . . ,  Qk)-  The  JoinPaths 
procedure  does  not  actually  require  finding  a  maximum  set  of  disjoint  paths  to  construct 
isolated  paths,  it  is  sufficient  to  find  a  maximal  set  of  disjoint  paths.  However,  for  technical 
reasons  explained  later,  our  maximal  path  algorithm  relics  on  the  fact  that  JoinPaths  does 
find  a  maximum  set  of  disjoint  paths.  A  second  reason  for  using  a  maximum  set  of  disjoint 
paths  is  that  it  is  not  known  how  to  construct  in  parallel  a  maximal  set  of  disjoint  paths 
without  constructing  a  maximum  set  of  disjoint  paths. 

A  major  portion  of  the  maximal  path  algorithm  is  to  construct  a  small  set  of  isolated 
paths.  We  construct  a  set  of  isolated  paths  Q  --  {Qi  ,•••  ,£?*.}  where  2  <  k  <  5.  (There 
is  nothing  special  about  the  the  bound  of  five,  it  is  just  necessary  to  have  k  <  c,  for 
some  constant  c.)  An  initial  set  of  isolated  paths  is  constructed  by  calling  JoinPaths (Vj0) 
(treating  each  vertex  as  a  path  of  length  one).  If  we  arc  very  lucky  and  a  single  path  is 
constructed,  then  we  can  find  a  path  starting  at  r  that  contains  at  least  half  the  vertices 
and  reduce  the  problem  without  bothering  with  a  splitting  path.  If  between  two  and  five 
paths  arc  constructed,  a  splitting  path  can  be  found  directly,  otherwise  the  number  of 
paths  must  be  reduced.  The  basic  idea  is  to  discard  some  of  the  paths  and  use  the  vertices 
of  the  discarded  paths  as  well  as  the  vertices  not  on  any  of  the  paths  to  join  the  remaining 
paths.  This  process  is  repealed  until  between  two  aurl  five  isolated  paths  remain.  It  is 
easy  to  reduce  the  number  of  paths  significantly  with  a  phase,  since  any  number  of  paths 
may  be  discarded.  It  is  necessary  to  make  sure  that  at  least  two  paths  remain  when  the 
paths  arc  joined. 

The  algorithm  for  reducing  the  number  of  paths  works  in  phases,  where  each  phase 
reduces  the  number  of  paths  on  hand  by  at  least  a  factor  of  y  A  phase  starts  with 
a  set  of  isolated  paths  Q  —  {Q\ , . . . ,  Qrn}-  To  reduce  the  number  of  paths  in  Q,  we 
choose  a  suitable  k  and  replace  Q  by  {Qi,...,Qk}  and  join  the  paths  in  the  new  Q.  The 
first  value  of  k  that  we  try  is  k  =  -3y.  If  J  (Qi, .  •  •  ,Q  ir?)  <  y  —  1,  then  Lemma  3.8 


guarantees  that  the  paths  do  not  collapse  to  a  single  path.  Suppose  J(Qi,  ■ . .  ,Qip)  >  yj 
Since  J(Qi,. . .  ,Qm)  =  0,  there  is  a  A:  >  —  such  that  J(Qi,  ■  ■  ■  ,Qk )  >  y  -  1  and 
J(Q i, . . .  ,Qfc+i)  <  y  —  1.  This  value  can  be  found  by  checking  all  values  of  k  in  parallel. 
This  value  of  k  still  might  not  be  satisfactory,  since  joining  Q j , . . . ,  Qk  could  cause  the  paths 
to  collapse  to  a  single  path,  while  joining  Qi,  ■  •  •  ,Qk+i  might  not  give  enough  reduction.  If 
this  is  the  case,  we  find  a  segment  P  of  Qk+i  such  that  y  -  2  <  J(Qi, . . . ,Qk,  P)  <  y  —  1. 
If  we  remove  a  vertex  from  Q/t+ii  we  change  the  number  of  joins  in  the  first  phase  of 
JoinPaths  by  at  most  two.  This  is  because  we  change  the  auxiliary  graph  used  in  JoinPaths 
by  two  vertices.  When  finding  a  maximum  set  of  disjoint  paths  in  a  graph  that  has  been 
altered  in  this  way,  the  number  of  paths  found  changes  by  at  most  two.  This  “continuity” 
property  is  why  we  chose  to  use  a  maximum  set  of  disjoint  paths  instead  of  settling  for 
maximal.  We  can  find  this  segment  P  by  testing  each  of  the  initial  segments  of  Qk+i  in 
parallel.  When  we  join  paths,  we  guarantee  that  the  number  of  paths  is  reduced  by  at 
least  —  —  2  since  the  first  phase  performs  at  least  that  many  joins. 

Reduce Paths[Q  —  {Qi, . . .  ,Qm}  ,V') 
begin 

if  J({Qi,...,Q:>p})  <  =  -  1  then 

J oinPaths( {(? i , . . .  ,Qs^.  },  V'  U  {Qa^.  (  u  . . .  ,Qm})\ 
else 
begin 

Find  a  k  >  — "  such  that  .  .  ,Qk)  >  j  -  1  and  J(Qi, . . . ,  Qk  \  l)  <  y  -  1; 

Suppose  Qk\  i  =p l  •••P3; 

Find  an  initial  segment  P  —  pi  •  •  •  p_,<  of  Qk  i  i  such  that 

=  2  <  J(Qt,.  .,Qk,r)  <  ?  -  l; 

JoinPaths{{Q i,. .  ,Qk,  P)  ,V‘  U  {Qk  i  j,  •  •  , Qm }  U  {pj>  |  i  •••?,}); 

end 

end. 

This  completes  our  description  of  the  pieces  of  our  algorithm  for  finding  a  maximal 
path.  We  now  put  them  together  and  give  the  full  algorithm. 

Maximal Path(V,  r) 
begin 

P  <-  /' Yn dSp tilling Pa(h(V,  r); 
if  P  is  a  maximal  path  then 
return  P ; 

Let  ruj  ■■■Uk  be  a  subpath  of  P  such  that  V  -  rui---tu  has  at  least  two  connected 
components  adjacent  to  u*.  Suppose  C  is  the  smallest  of  the  components  adjacent  to 
Uk  and  v  E  C  is  adjacent  to  u*; 

P'  <—  Maxi7nalPalh{C,v)', 
return  rui  •  •  •  UfcP’; 
end. 
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FindSplittingPath(y,  r) 

begin 

if  V  is  not  biconnected  then 
begin 

Let  v  be  an  articulation  point  in  the  same  biconnected  component  as  r; 

Let  P  be  a  shortest  path  from  r  to  v; 
return  P; 
end 
else 
begin 

comment  First  we  find  a  set  of  paths  that  separates  the  graph; 

Q  =  {Qif  ■  ,Qm}  «-  JoxnPathsiy,  0); 
if  m  =  1  then 

reduce  the  problem  directly; 

V  <-  0; 

while  m  >  5  do 
Reduce  Paths{Q,  V')\ 

Let  n  be  an  endpoint  of  Q\  and  xj  be  an  endpoint  of  Qi,  xi,xa  ^  r ; 

Q'  <—  {Qi  -  xi ,Qi  -  X2, . .  •  ,Qm}i 

comment  The  paths  separate  the  graph,  we  now  find  the  splitting  path; 

P  *—  ExtendPath(r,Q' ,V)\ 

if  P  is  a  splitting  path  or  a  maximal  path  then 
return  P\ 

without  loss  of  generality,  assume  Xi  has  all  of  its  neighbors  on  P; 
if  Xi  €  P  then 

Q"  *-  {Pi,  Pi}  where  Pj  and  Pj  are  the  segments  of  P  -  n; 

else 

q"  -  { py, 

P  «—  ExtendPath{r,Q" ,V  —  n); 
if  P  is  a  splitting  path  then 
return  P; 

else  if  Px,  is  a  splitting  path  or  a  maximal  path  then 
return  Pxi; 
else 
begin 

Construct  a  maximal  path  P'  using  Lemma  3.5; 

return  P'; 
end 

end 

end. 

Each  of  the  calls  to  MaximalPath  reduces  the  problem  by  half,  so  it  is  called  at 
most  logn  times.  The  most  time  consuming  steps  in  FindSplittingPath  arc  the  calls  to 
Reduce  Paths.  Reduce  Paths  runs  in  polylog  time  and  is  called  O(logn)  times  so  the  entire 
procedure  runs  in  polylog  time.  Hence,  we  have  the  following  theorem: 

Theorem  3.3.  The  maximal  path  problem  is  in  R.MC. 
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3.6.  Depth  First  Search 


In  this  section  a  parallel  algorithm  for  depth  first  search  is  described.  The  depth  first 
search  problem  is: 

Given  a  graph  G  =  (K,  E )  and  a  vertex  r  £  V,  find  a  spanning  tree  T  of  G  that 

could  be  constructed  by  some  depth  first  search  of  G  with  root  r. 

Our  algorithm  runs  in  0(^/nlogfcn)  and  uses  a  polynomial  number  of  processors.  The 
original  motivation  for  looking  at  the  maximal  path  problem  is  its  relation  to  depth  first 
search.  Any  branch  from  the  root  to  a  leaf  in  a  depth  first  search  tree  is  a  maximal  path. 
The  maximal  path  algorithm  does  not  apply  directly  to  give  a  fast  parallel  algorithm  for 
depth  first  search,  since  0(n)  maximal  paths  might  be  necessary  to  construct  a  depth  first 
search  tree.  However,  the  general  techniques  and  tools  developed  for  the  maximal  path 
problem  are  used  for  depth  first  search. 

The  depth  Grst  search  algorithm  uses  the  same  strategy  as  was  used  for  finding  a 
maximal  path.  A  partial  solution  is  found  that  allows  the  problem  to  be  reduced  to 
smaller  problems  which  are  solved  recursively.  The  algorithm  finds  a  set  Q  of  disjoint 
paths  such  that  the  size  of  the  largest  connected  component  of  V  -  Q  is  less  than  |n. 
These  paths  arc  referred  to  as  a  separating  set  of  paths.  Note  that  this  usage  of  the  term 
separate  differs  from  our  usage  in  the  description  of  the  maximal  path  algorithm.  An 
initial  segment  of  a  depth  first  search  tree  containing  Q  is  constructed.  Depth  first  search 
trees  arc  then  found  for  the  remaining  components.  Since  the  problem  is  halved  at  each 
level,  the  depth  of  recursion  is  at  most  logn.  The  procedures  for  finding  separating  paths 
and  constructing  an  initial  segment  of  the  tree  both  lake  0(v/nlogfcn)  time.  We  first 
describe  the  construction  of  the  initial  segment  and  then  discuss  the  more  complicated 
step  of  finding  the  separating  paths. 

The  routine  InitialSegmenl  is  given  a  set  of  disjoint  paths  Q  and  constructs  a  subtree 
T'  of  some  depth  first  search  tree  T  with  all  the  vertices  of  Q  contained  in  T' .  In  the  depth 
first  search  algorithm,  there  will  be  0(\/n  logn)  paths  in  Q  when  InilialSegment  is  called. 
InitialSegmenl  is  essentially  a  sequential  algorithm;  however  it  uses  the  parallel  routine 
ExtendPath  described  above.  InitialSegmenl  maintains  the  connected  components  of  the 
vertices  not  in  the  subtree  T' .  A  component  is  said  to  be  active  if  it  contains  a  path  from 
the  set  Q. 

InitialSegmenl[Q,  r) 

begin 

r  r-  r; 

while  there  is  an  active  component  of  V  —  T'  do 

begin 

Let  v  be  the  lowest  vertex  on  T'  adjacent  to  an  active  component  C  of  V  —  T’\ 

P<  ExtendPath(v,Q,C ); 

Add  P  to  r; 

Recompute  the  connected  components  of  V  —  T'\ 

end 

end. 


41 


Each  phase  of  the  routine  ExtendPath  reduces  the  length  of  some  path  in  Q  by  a  factor 
of  at  least  one  half.  If  there  are  initially  m  paths  in  the  set  Q,  then  InitialSegment  will 
take  0(m  logfc  n)  time. 

Lemma  3.9.  The  tree  T'  constructed  by  InitialSegment  can  be  extended  to  a  depth  first 
search  tree. 

Proof:  It  suffices  to  show  that  there  are  no  paths  between  separate  branches  of  T'  that 
have  all  their  interior  vertices  in  V  —  T' .  This  condition  holds  throughout  the  execution 
of  InitialSegment  since  the  extensions  are  made  at  the  lowest  vertex  adjacent  to  some 
component.  | 

We  now  show  how  to  find  a  separating  set  of  paths.  We  construct  a  set  Q  of  disjoint 
paths  where  Q  contains  0(^n  log  n)  paths  and  the  largest  component  of  V  —  Q  has  size 
at  most  The  procedure  runs  in  0(y/n\ogK  n)  time.  The  routine  Separate  constructs  a 
set  of  paths  and  then  attempts  to  reduce  the  number  of  paths  while  keeping  the  connected 
components  of  the  vertices  not  on  the  paths  small.  The  paths  arc  reduced  by  joining 
them  together  or  by  removing  vertices  from  them.  The  routine  maintains  several  sets  of 
paths.  The  set  Q  contains  paths  that  arc  in  the  separator.  Once  a  path  is  put  into  Q  it  is 
committed  to  the  separating  set  and  is  not  be  removed.  The  set  S  stores  the  paths  that 
are  currently  being  worked  on.  The  set  R  is  used  for  temporary  storage;  paths  are  put 
into  R  to  set  them  aside  for  the  next  phase.  The  vertices  not  on  any  of  the  paths  in  Q,  R, 
or  S  arc  in  a  set  T.  The  size  of  the  largest  component  of  T  is  at  most  | .  The  algorithm 
runs  until  R  and  S  are  empty. 

In  Separate  an  iteration  of  the  outer  while  loop  is  referred  to  as  a  phase  and  an 
iteration  of  the  inner  while  loop  is  a  subphase.  Each  phase  halves  the  number  of  paths  in 
S.  A  subphasc  reduces  the  length  of  each  path  in  S  by  one.  Subphascs  arc  repeated  until 
all  of  the  vertices  in  paths  of  S  have  been  moved  to  R  or  T.  The  paths  of  R  arc  then 
moved  back  to  S  for  the  next  phase.  Phases  are  repeated  until  It  and  S  arc  empty.  In  the 
routine  below,  Join(S,T )  is  a  procedure  that  performs  the  first  phase  of  JoinPaths[S,T). 
Juin(S,T)  finds  a  maximum  set  of  vertex  disjoint  paths  in  7’  between  the  endpoints  of  the 
paths  in  S.  The  paths  in  S  arc  then  joined  using  the  disjoint  paths  that  were  found. 

Separate 

begin 

Q  «-  0;  R  —  0;  T  <-  0; 

S*-V; 

while  5^0  do 
begin 

while  5  ^  0  do 
begin 

Join(S,  T ); 

Move  the  joined  paths  from  5  to  /?; 

Move  one  endpoint  of  each  remaining  path  in  5  to  T; 

if  there  is  a  component  in  T  of  size  >  ^  then 

Fix  the  component  size  by  moving  a  vertex  from  T  to  Q\ 

end 

5  -  R\  R  *-  0; 


Move  paths  of  length  >  y/n  from  S  to  Q; 

end 

end. 

The  goal  of  the  routine  Separate  is  to  remove  all  the  paths  from  S  and  R  while  making 
sure  that  the  largest  connected  component  of  T  remains  of  size  at  most  Separate 
accomplishes  this  by  alternately  joining  paths  of  S  and  by  moving  some  vertices  from 
paths  in  S  to  T.  The  paths  are  removed  in  two  different  ways.  When  paths  are  joined, 
any  paths  of  length  at  least  y /n  are  put  into  Q.  The  other  way  paths  can  be  removed  is  if 
all  of  their  vertices  are  put  into  the  set  T. 

A  subphase  begins  by  joining  as  many  paths  as  possible  using  the  vertices  of  T.  The 
paths  of  S  that  are  joined  are  then  put  into  R  and  set  aside  until  the  next  phase.  The 
next  step  is  to  move  one  endpoint  of  each  of  the  paths  remaining  in  S  to  T.  The  moving 
of  endpoints  accomplishes  a  dual  purpose;  it  makes  new  vertices  of  S  endpoints,  allowing 
additional  joins  in  the  next  subphase,  and  reduces  the  lengths  of  the  paths  in  S.  Before 
the  endpoints  are  removed,  each  connected  component  is  adjacent  to  the  endpoints  of  at 
most  one  path.  This  means  that  when  the  endpoints  are  moved  from  S  to  T,  the  only  way 
components  of  T  are  merged  is  if  they  are  adjacent  to  the  endpoints  of  the  same  path  in 
S.  At  most  one  component  of  size  ^  can  be  formed  each  subphase.  If  a  large  component 
is  formed,  then  removing  the  endpoint  that  caused  it  to  merge  reduces  the  components  to 
size  less  than  |.  When  the  vertex  is  removed,  it  is  placed  into  Q  as  a  path  of  length  one. 

The  following  lemmas  establish  that  Separate  constructs  the  desired  separator  in 
0{y/n log  n)  time. 

Lemma  3.10.  The  largest  connected  component  ofT  has  size  at  most 

Proof:  The  only  time  vertices  are  added  to  T  is  when  endpoints  of  paths  in  S  arc  moved 
to  T.  When  this  is  done,  if  a  component  of  size  greater  than  is  formed,  a  vertex  can  be 
moved  from  T  to  Q  to  reduce  the  components  to  size  at  most  | 

Lemma  3.11.  There  arc  at  most  y/n  subphascs  per  phase. 

Proof:  At  the  start  of  a  phase,  the  paths  in  S  have  length  at  most  y/n.  Each  subphase 
reduces  the  lengths  of  all  the  paths  remaining  in  S  by  one.  1 

Lemma  3.12.  The  number  of  phases  is  at  most  logn. 

Proof:  Each  path  at  the  start  of  a  phase  is  the  join  of  two  paths  from  the  previous  phase, 
so  the  number  of  paths  in  S  is  halved  by  each  phase.  | 

Lemma  3.13.  When  Separate  is  finished,  there  are  0(y/n\ogn)  paths  in  Q. 

Proof:  At  most  y/n  paths  of  length  y/n  can  be  placed  in  Q.  Each  subphase  places  at 
most  one  singleton  in  Q.  Since  there  arc  at  most  y^nlogn  subphascs  altogether,  there  are 
0(v/nlogn)  paths  in  Q.  | 

A  subpliasc  takes  polylog  time  since  the  time  it  takes  is  dominated  by  the  time  it  takes 
to  find  a  maximum  set  of  disjoint  paths.  Since  Separate  has  at  most  y/n  logn  subph.ises, 
Separate  runs  in  0(y/ri!ogrn)  time.  The  resulting  algorithm  is  probabilistic  since  it  uses 


the  algorithm  of  Section  3.4  to  find  a  maximum  set  of  disjoint  paths.  The  depth  first  search 
algorithm  does  not  actually  require  a  maximum  set  of  disjoint  paths,  it  would  suffice  to 
find  a  maximal  set  of  disjoint  paths.  However,  as  was  mentioned  above,  it  is  not  currently 
known  how  to  construct  a  maximal  set  of  disjoint  paths  without  finding  a  maximum  set  of 
disjoint  paths.  For  graphs  with  bounded  degree,  the  deterin  listic  algorithm  for  finding  a 
maximal  set.  of  disjoint  paths  can  be  used,  so  for  that  restricted  case  the  depth  first  search 
algorithm  is  deterministic  with  approximately  the  same  time  bound. 


3.7.  Discussion 

In  this  chapter  we  have  presented  parallel  algorithms  for  several  path  problems.  The 
major  results  of  the  chapter  were  that  a  maximal  path  can  be  found  by  an  HSIC  algorithm 
and  that  a  depth  first  search  tree  can  be  constructed  in  parallel  time  0(n1/,2f().  The 
algorithms  of  this  chapter  illustrate  a  number  of  techniques  for  parallel  algorithms.  None 
of  the  algorithms  depends  on  path  doubling,  although  path  doubling  is  present  in  a  number 
of  implementation  details.  The  algorithm  for  finding  a  long  path  used  a  novel  probabilistic 
approach.  Probabilistic  methods  seem  important  for  quite  a  few  parallel  algorithms.  The 
algorithms  for  finding  a  maximal  set  of  disjoint  paths  and  the  maximal  path  algorithm 
both  make  use  of  the  iterated  improvement  strategy.  The  maximal  set  of  disjoint  paths 
algorithm  built  up  its  solution  by  adding  paths  to  the  solution  and  the  maximal  path 
algorithm  reduces  the  size  of  a  separator  by  joining  paths.  The  algorithm  for  finding  a 
maximum  set  of  disjoint  paths  reduced  the  problem  to  matching. 

There  arc  a  number  of  open  problems  related  to  these  path  problems.  By  far  the 
most  important  open  problem  is  whether  depth  first  search  can  be  solved  by  an  ZSJC  or 
SIC  algorithm.  The  major  obstacle  to  speeding  up  our  algorithm  to  an  ZSIC  algorithm  is 
that  it  appears  difficult  to  reduce  the  lengths  of  the  paths  substantially  when  joining  paths 
while  still  making  sure  that  the  connected  components  of  the  vertices  not  on  the  paths 
remain  small.  Although  these  difficulties  arc  technical  in  nature,  it  might  be  necessary 
to  take  a  different  approach  to  gel  a  substantially  faster  depth  first  search  algorithm.  A 
second  interesting  open  problem  is  whether  the  maximal  path  problem  can  solved  in  SIC. 
One  way  this  could  be  done  is  to  find  a  deterministic  algorithm  for  matching.  It  would 
also  be  interesting  to  find  a  simpler  algorithm  than  ours  for  the  maximal  path  problem, 
even  if  it  still  relied  on  randomness.  Our  results  on  the  maximal  path  problem  and  depth 
first  search  do  not  carry  over  to  directed  graphs;  the  directed  variants  of  the  problems  are 
open.  A  final  open  problem  is  whether  a  maximal  set  or  disjoint  paths  can  be  found  by  a 
fast  parallel  algorithm  without  using  matching. 
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Chapter  4  Approximating  P-Complete  Problems 


4.1.  Introduction 

In  this  chapter  we  investigate  various  ways  of  finding  approximate  solutions  to  P- 
coniplctc  problems.  The  type  of  approximation  that  we  are  interested  in  is  fast  parallel 
algorithms  that  give  solutions  that  are  close  to  the  desired  solution.  We  present  results 
for  parallel  approximation  of  P-complete  problems  that  are  very  similar  to  the  results  on 
sequential  approximation  of  NP-complete  problems. 

There  arc  a  number  of  reasons  to  look  at  approximating  P-complete  problems.  The 
main  reason  is  since  P-complete  problems  probably  are  not  amenable  to  fast  parallel  so¬ 
lution,  it  is  important  and  interesting  to  see  what  can  be  done  on  these  problems  using 
parallelism.  A  second  reason  to  look  at  approximation  is  to  develop  the  theory  of  paral¬ 
lel  approximation  in  analogue  to  the  sequential  theory.  The  problems  which  arise  when 
looking  at  approximate  solutions  are  often  problems  that  are  close  to  the  “boundary”  of 
what  can  and  cannot  be  done  efficiently  with  parallelism,  thus  they  arc  important  for  the 
general  study  of  parallelism. 

In  this  chapter  we  look  at  a  number  of  P-complete  problems  and  examine  to  what 
extent  they  can  be  approximated  by  fast  parallel  algorithms.  The  first  problem  is  the  high 
degree  subgraph  problem.  This  problem  is  to  find  a  vertex  induced  subgraph  of  a  graph 
that  has  all  vertices  of  degree  at  least  as  big  as  some  given  k.  We  show  tight  bounds 
on  the  degree  of  approximation  achievable  for  a  variant  of  this  problem  by  a  fist  parallel 
algorithm  assuming  that  P  /  SiC.  The  next  topic,  that  we  look  at  is  approximating  number 
problems  and  show  that  it  is  very  similar  to  the  situation  for  sequential  computation. 
Some  P-completc  number  problems  can  be  solved  by  fast  parallel  algorithms  when  the 
numbers  are  small.  We  give  two  examples  of  problems  that  can  be  approximated  very  well 
by  solving  restricted  cases  of  the  problem  and  then  relating  the  solution  to  the  original 
problem.  We  define  strong  P-completcncss  analogously  to  strong  NP-completcncss  so  that 
we  can  identify  number  problems  that  remain  difficult  even  if  the  numbers  involved  are 
small.  We  show  that  the  problem  of  computing  a  first  fit  decreasing  bin  packing  is  strongly 
P-complete.  We  also  show  how  a  related  packing  scheme  which  performs  os  well  as  first 
fit  decreasing  can  be  computed  in  MC. 


4.2.  The  High  Degree  Subgraph  Problem 
The  high  degree  subgraph  problem  is: 

Given  a  graph  (7  (V,  1C)  and  an  integer  k,  find  the  maximum  induced  subgraph 

of  the  graph  that  has  all  vert  ices  of  degree  at  least  k. 

It  is  interesting  that  this  problem  is  P-complete  since  it  is  so  simple.  Most  known  P- 
complete  graph  theory  problems  have  some  other  device,  such  as  weights  or  an  ordering, 
that  make  them  difficult.  However,  this  problem  could  be  called  a  “purely  combinatorial 


problem.”  The  high  degree  subgraph  problem  can  be  approximated  in  several  different 
ways.  In  the  next  section  we  derive  bounds  on  the  degree  of  approximation  that  is  possible 
for  a  variant  of  the  problem.  We  also  discuss  other  approaches  that  construct  subgraphs 
with  all  vertices  of  high  degree. 

The  high  degree  subgraph  problem  can  be  solved  by  a  simple  sequential  algorithm. 
The  algorithm  discards  vertices  of  degree  less  than  k  one  at  a  time  until  all  vertices  have 
degree  at  least  k  or  the  graph  is  empty.  The  correctness  of  this  algorithm  follows  from 
two  easy  lemmas.  The  first  lemma  establishes  that  there  is  a  unique  maximum  induced 
subgraph  of  G  with  minimum  degree  at  least  k.  We  denote  this  subgraph  by  lll)S ^(G). 

Lemma  4.1.  Let  S  and  T  be  maximum  induced  subgraphs  of  G  that  have  minimum 
degree  at  least  k,  then  S  —  T. 

Proof:  The  induced  subgraph  on  S  U  T  has  minimum  degree  at  least  k.  Since  S'  and  T 
are  maximum,  |S|  -  \T\  —  15  UT|,  so  5  ~  T.  | 

Lemma  4.2.  The  sequential  algorithm  outlined  above  finds  II  l)S  k(G). 

Proof:  Let  5  be  the  induced  subgraph  found  by  the  algorithm.  Since  the  vertices  of  5 
have  degree  at  least  k,  S  C  [IDS *.((»).  Suppose  that  5  /  HI)Sk[G)-  Let  v  be  the  first 
vertex  of  lll)Sk(G)  discarded  by  the  algorithm  and  let  T  be  the  graph  just  before  v  is 
discarded.  Since  v  has  degree  less  than  k  in  7’  and  IIDSk(G)  C  T,  v  must  have  degree  less 
than  k  in  lll)Sk{G),  a  conti, uliction.  Hence  S  --  IIDSk(G)-  I 

It  is  possible  that  fll)Sk(G)  is  empty.  The  following  lemma  due  to  Erdos  [10]  estab¬ 
lishes  an  important  case  where  lll)Sk{G)  is  nonempty. 

Lemma  4  3.  If  a  graph  has  n  vertices  and  rn  edges  then  if  has  an  induced  subgraph  with 
minimum  degree  at  least  |  ] . 

Proof:  The  proof  is  by  induct  ion  on  the  number  of  vertices  in  the  graph.  The  result  holds 
lor  graphs  consisting  of  a  single  vertex.  Suppose  the  result  holds  for  all  graphs  with  fewer 
than  n  vertices.  Let  G  be  a  graph  with  n  vertices  and  m  edges.  If  all  vertices  of  G  have 
degree  at  least  [  " '  | ,  then  the  graph  itself  is  an  induced  subgraph  with  minimum  degree 
at  least  [  ].  Otherwise  wc  can  delete  a  vertex  of  minimum  degree  along  with  its  incident 

edges,  leaving  a  graph  with  ti  1  vertices  and  m  —  k>  m  -  [  ™  ]  edges.  By  the  induction 
hypothesis  the  remaining  graph  has  an  induced  subgraph  with  minimum  degree 


I 

The  high  degree  subgraph  problem  can  be  reformulated  as  a  decision  problem  1 1  OS, 
by  asking  if  a  specific  vertex  v  is  in  III). S' *■((»’).  We  show  that  I  IDS  is  I’-complete  by  giving 
a  reduction  from  the  monotone  circuit  value  problem.  We  also  give  a  stronger  result  by 
showing  that  it  is  I’-complete  to  determine  if  ///AS\((7)  is  nonempty. 


chains  can  easily  be  identified  by  path  doubling  techniques.  When  the  chains  are  deleted, 
more  vertices  of  degree  one  might  be  created,  however  each  new  vertex  of  degree  one 
requires  the  removal  of  at  least  two  chains,  so  the  number  of  chains  decreases  by  at  least 
half  each  phase. 

The  P-complctencss  result  can  be  made  stronger  in  the  sense  that  the  problem  of 
determining  if  IfDSk{G)  is  nonempty  is  P-completo.  In  the  next  section,  we  discuss  the 
problem  of  determining  the  largest  k  such  that  II I)S i  (G)  is  nonempty.  The  stronger  P- 
complcteness  result  shows  that  this  problem  probably  cannot  be  solved  exactly  by  an  .VC 
algorithm. 

Theorem  4.2.  The  problem  of  determining  if  IlDS r{G)  is  nonempty  is  incomplete. 

Proof:  The  proof  is  a  reduction  from  the  monotone  circuit  value  problem.  The  construc¬ 
tion  of  the  previous  theorem  is  modified  so  that  given  a  monotone  circuit  .?  t,,...,/?n, 

a  graph  G  is  constructed  such  that  II DS  3(G)  is  nonempty  if  and  only  if  the  output  of  the 
circuit  is  true. 

The  subgraphs  and  connections  for  (lie  AND  gates,  OR  gates,  and  0  INPUTS  are  the 
same  as  in  the  previous  construction.  The  subgraph  for  the  l-INPl’T  yfi  is: 


If  the  gate  fl}  receives  an  input  from  fth,  there  is  and  edge  from  k\  to  There  is  a  binary 
tree  that  has  as  its  leaves  the  vertices  k\  of  the  l-INPUTS  /I k.  The  output  vertex  of  the 
subgraph  for  the  final  gate  is  the  root,  of  this  tree. 

The  computation  of  IIDS?,(G)  simulates  the  circuit  in  the  same  manner  as  the  previous 
theorem.  If  the  final  output  is  removed,  then  all  of  the  vertices  of  the  tree  are  removed  and 
then  all  of  the  l-INPUTS  are  removed.  This  causes  all  of  the  vertices  to  he  removed,  so 
lll)Sj(G)  is  empty.  If  the  final  output  is  not  removed,  then  the  vertices  of  the  tree  and  the 
1-INPUTS  arc  not  removed,  so  a  subgraph  with  minimum  degree  three  is  left.  | 


4.3.  Approximations  to  the  High  Degree  Subgraph  Problem 

If  a  problem  is  known  to  be  P-complete,  there  is  little  hope  of  finding  an  MC  algorithm 
to  solve  it.  As  is  often  done  with  NT-complete  problems,  we  can  lower  our  sights  and 
attempt  to  find  an  approximate  solution.  The  high  degree  subgraph  problem  is  well  suited 
for  approximation  since  it  can  be  rephrased  as  an  optimization  problem.  The  optimization 
problem  is  to  ask  what  is  the  largest  k  such  that  IIPSk[G)  is  nonempty.  This  value  is 
denoted  IIDS(G).  It  follows  from  Theorem  4.2  that  i!  is  P-complete  to  compute  IIPS[G). 
An  approximate  solution  to  this  problem  is  to  find  a  k  such  that  IIPS[G)  '•  k  "•  1  lll)S[G) 
for  sonic  fixed  c  1.  We  say  that  this  is  an  approximation  within  a  factor  of  c.  We  show 
that  this  high  degree  subgraph  problem  can  be  approximated  to  a  factor  of  c  for  any  c  >  2 


in  SIC,  but  cannot  bo  approximated  in  Si  C  for  c  <  2,  unless  P  ~  SIC-  This  result  is 
analogous  to  a  number  of  results  on  approximating  NP-comp)etc  problems  where  lower 
bounds  on  the  degree  of  approximation  arc  known  assuming  that  P  -f  SI  P  ■  For  example 
it  is  known  that  graph  coloring  cannot  Ire  approximated  to  a  factor  of  less  than  two  (GJJl 
and  precedence  constrained  scheduling  cannot  be  approximated  to  a  factor  of  less  than  | 

Theorem  4.3.  For  any  constant  c  >  2,  the  optimization  problem  can  be  solved  by  an  Si C 
algorithm  to  a  factor  of  c. 

Proof:  Let  r  >  0.  The  following  routine  Tcst[V,k)  returns  an  answer  which  is  either 
“the  graph  has  no  subgraph  with  minimum  degree  k" ,  or  “the  graph  has  a  subgraph  with 
minimum  degree  at  least  -~'~k" . 

Test{V,k ); 

begin 

while  V  -/  0  do 
begin 

U  .  {v  C  V  |  dcg(u)  <  k}, 
if  j (/ 1  <  (  | VI  then 
return  “IIDS(G)  >  1 
V  V  U; 

end 

return  “ lll)S(G )  <  A:1’; 

end. 

Each  deration  of  the  while  loop  discards  all  vertices  with  degree  less  than  k.  Since  a 
constant  fraction  of  (  he  vertices  is  discarded  in  each  iteration,  there  are  O(logn)  iterations. 
The  algorithm  can  be  implemented  so  that  a  single  iteration  fakes  O(logn)  time,  so  it  is 
an  Si C  algorithm.  II  'l'csl(V,k)  terminates  with  an  empty  set  of  vertices,  the  graph  docs 
not  have  a  subgraph  wit  h  minimum  degree  k.  Suppose  t  hat  7 <:*•/(  V,  k)  terminates  with  n1 
vertices,  i  burning  that.  III)S(G)  ^  k.  The  number  or  edges  when  Tcst[V,  k)  terminates 
is  at  least.  '  kn\  so  by  Lemma  1.3,  G  has  a  subgraph  with  minimum  degree  at  least 
-y k.  The  procedure  Tcsl(V,k)  is  a|>plied  for  each  value  of  k  between  1  and  n.  A  value 
k  is  found  where  the  graph  has  a  subgraph  of  degree  at  least  k  but  no  subgraph  of 
degree  k  I  1.  This  gives  an  approximation  to  within  a  factor  of  y™.  | 

The  next  theorem  shows  that  the  previous  result  is  essentially  the  best  possible  .as¬ 
suming  that  P  /  SIC.  We  show  that  a  circuit  can  be  simulated  by  a  graph  which  has 
lll)S(C)  2k  if  the  output  of  the  circuit  is  true  and  IIDS(G)  k  !  1  if  it  is  false.  If  the 
problem  could  be  approximated  by  a  factor  of  less  than  two  then  the  following  construction 
could  be  used  to  solve  the  monotone  circuit  value  problem. 

Theorem  4.4.  If  P  /  SIC,  then  it  is  not  possible  to  approximate  HDS{G)  in  SiC  by  a 
factor  less  than  two. 

Proof:  This  theorem  is  proved  by  giving  a  log-space  transformation  of  a  monotone  circuit 
to  a  graph  G  which  has  lll)S(G)  2k  if  the  output  of  the  circuit  is  true ,  and  lll)S[G)  -- 
k  |-  1  if  it  is  false.  The  figures  are  for  k  3,  the  generalization  to  other  values  of  k  is 
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straightforward.  For  each  AND  and  OR  gate  there  is  a  subgraph.  These  subgraphs  are 
connected  together  in  a  manner  that  corresponds  to  connections  of  the  circuit.  There  are 
also  subgraphs  which  are  called  expanders.  An  expander  is  shown  below.  The  expanders 
are  used  to  fanout  values  and  are  also  used  in  the  AND  gate.  The  vertices  are 

the  expander’s  inputs  and  oi,...,Oj  arc  the  expander’s  outputs.  An  expander  consists  of 
a  number  of  levels  of  k  vertices  each.  Adjacent  levels  arc  connected  by  complete  bipartite 
graphs.  Each  level  is  connected  to  an  output  of  the  expander.  The  expander  is  terminated 
with  two  layers  that  have  k  +  1  vertices.  The  vertices  of  the  last  level  are  joined  to  form  a 
k  +  l  clique. 


The  OR  and  AND  gates  arc  illustrated  in  the  figures  below.  The  gates  have  two 
sets  of  input  vertices  ti, . . .  ,i*  and  ,i'k  and  a  set  of  output  vertices  o1( . . .  ,0^.  The 

output  vertices  of  a  gate  arc  connected  to  an  expander  which  is  connected  to  the  input 
vertices  of  the  appropriate  gates.  The  expanders  fanout  values  and  insure  that  information 
is  propagated  correctly.  In  the  AND  gate,  the  long  rectangle  is  a  A: ’-expander.  The  output 
of  the  final  gate  is  connected  to  an  expander  which  goes  to  all  of  the  input  vertices  of 
gates  that  correspond  to  connections  from  1-INPUTS  in  the  circuit.  The  input  vertices  of 
gates  that  receive  1-INPUTS  have  degree  2k  and  the  input  vertices  of  gates  that  receive 
0-INPUTS  have  degree  k. 

The  circuit  is  simulated  by  computing  HDS^i  2(G).  This  “evaluates”  the  gates  in 
topological  order.  A  value  is  represented  by  a  set  of  k  vertices.  A  group  of  vertices  all  with 
degree  at  least  2k  indicate  true  and  a  group  of  vertices  of  degree  at  most  k -|- 1  indicate  false. 
In  computing  IIDS  k+2(G),  the  vertices  indicating  false  values  arc  removed.  If  the  input 
vertices  of  an  expander  have  degree  k,  then  all  of  the  vertices  of  the  expander 

arc  removed,  whereas  if  they  have  degree  2k,  they  arc  left.  This  is  the  manner  in  which 
values  arc  propagated.  When  an  AND  gate  is  simulated,  all  of  its  vertices  are  removed  if 
at  least  one  of  its  inputs  is  false,  and  all  the  vertices  of  an  OR  gate  are  removed  if  both 
of  its  inputs  are  false.  If  the  output  of  the  final  gate  is  false,  then  the  expander  that  goes 
back  to  the  1-INPUTS  is  removed.  This  causes  all  of  the  remaining  vertices  to  be  removed, 
so  !!DS(G)  <  k  t  1-  If  the  final  output  is  true,  then  a  graph  with  minimum  degree  2k  is 
left,  so  HDS(G)  —  2k.  IIDS(G)  >  k  +  1  since  an  expander  has  a  subgraph  with  minimum 
degree  k  +  1.  1 
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AND  gate  using  a  k-expander 
4.4.  Finding  a  High  Degree  Subgraph 

A  second  approach  to  approximating  the  high  degree  subgraph  problem  is  to  attempt 
to  find  a  subgraph  with  high  degree  without  insisting  that  it  is  the  maximum  subgraph  with 
that  degree.  We  discuss  two  algorithms  for  this  type  of  approximation.  One  algorithm 
discards  vertices  of  low  degree  until  all  vertices  have  a  certain  degree.  This  algorithm 
constructs  a  supergraph  of  IIDSk[G).  The  algorithm  exhibits  an  interesting  relationship 
between  the  time  it  takes  and  how  good  the  approximation  to  I1DSk{(<)  is.  The  second 
approach  is  to  relate  a  high  degree  subgraph  to  a  maximum  density  subgraph.  The  problem 
of  constructing  a  maximum  density  subgraph  can  be  reduced  to  a  unit  capacity  network 
flow  problem,  so  it  can  be  solved  in  R.MC. 
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An  approximation  to  the  problem  of  computing  HDS k(C)  is  to  find  a  supergraph  of 
IIDS^(G)  that  has  all  vertices  with  degree  at  least  *.  The  closer  c  is  to  one,  the  better 
the  approximation.  The  result  of  the  previous  section  shows  that  this  is  not  possible  with 
an  SIC  algorithm  for  c  a  constant  less  than  two,  unless  P  —  SIC.  In  this  section  we  give 
a  family  of  algorithms  for  this  type  of  approximation.  The  algorithm  A  approximates  to 
a  factor  e  -i(n)  and  runs  in  time  7,/)(n).  The  degree  of  approximation  improves  as  the  run 
time  increases. 

The  sequential  algorithm  for  computing  [IDS ^(C)  discards  vertices  of  degree  less  than 
k  until  all  vertices  have  degree  at  least  k.  The  reason  that  this  algorithm  appears  inherently 
sequential  is  that  it  is  difficult  to  predict  which  vertices  eventually  have  degree  less  than  k. 
When  vertices  are  discarded,  some  other  vertices  have  their  degrees  reduced  to  less  than  k, 
and  then  they  arc  also  discarded.  It  is  possible  that  all  vertices  that  initially  have  degree 
at  least  k  arc  removed.  One  way  to  control  the  number  of  vertices  that  are  discarded  is  to 
throw  out  vertices  of  degree  much  less  than  k.  If  vertices  of  degree  £  arc  removed,  then 
the  number  of  vertices  that  initially  have  degree  at  least  k  that  get  removed  is  bounded. 
This  is  formalized  in  the  following  lemma: 

Lemma  4.4.  If  G  (V,  E)  if  an  n  vertex  graph  with  p  vort  ices  of  degree  less  than  k,  then 
G  contains  a  subgraph  with  minimum  degree  *  that  has  at  least  n  -  l^p  vertices. 

Proof:  Suppose  vertices  of  degree  less  than  are  removed  until  .ill  vertices  have  degree 
at  least  * .  bet  be  the  set  of  vertices  that  initially  have  degree  at  least  k  that  are 
removed.  Before  a  vertex  of  Sk  is  removed,  it  must  have  had  2 k  edges  removed  that  go  to 
it.  Each  vertex  that  is  removed  can  have  at  most  *  edges  which  go  to  members  of  that 
arc  removed  after  it  is.  Putting  these  two  facts  together  we  have  ^  j.S^.|  <  *p  *’  |5fc|,  so 
\Sf,-\  <  ?J.  Hence  at  most  :jp  vertices  arc  removed.  1 

The  lemma  provides  a  way  to  find  a  subgraph  with  minimum  degree  £  in  (){n1^2  logn) 
time  assuming  that  the  graph  has  a  subgraph  with  minimum  degree  k.  If  IIDSk[G)  is 
empty,  then  the  algorithm  might  terminate  with  an  empty  set.  of  vertices.  The  algorithm 
is: 

FintlSubijraph  t  (V ,  k) 

begin 

while  at  least  n1^2  vertices  have  degree  less  than  k  do 
remove  vertices  of  degree  less  than  k ; 
while  there  is  a  vertex  of  degree  less  than  £  do 
remove  vertices  of  degree  less  than 

end. 

Each  iteration  of  a  loop  takes  O(logu)  time.  The  first  loop  cannot  be  executed  more 
than  n*/2  times  since  it  removes  at  least  nl/2  vertices  each  iteration.  Lemma  -1.4  insures 
that  the  second  loop  does  not  remove  more  than  :]n'^2  vertices,  so  it  also  has  0(n'/2) 
iterations. 

This  algorithm  can  be  generalized  to  one  (hat  uses  more  than  two  phases  of  discarding 
vei  l  ices.  In  FindSulHjrnphS ,  the  approximation  factor  anil  the  runtime  depend  upon  the 
function  /(n).  The  algorithm  FutdSuhjraph  I  corresponds  to  I'  indSuhijraph 2  with  f(n)  — 
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n1^2.  The  algorithm  FindSubyraphS  maintains  two  counts,  bound  and  threshold.  A  phase 
consists  of  discarding  vertices  with  degree  less  than  bound.  As  long  as  there  are  at  least 
threshold  vertices  of  degree  less  than  bound,  then  vertices  with  degree  less  than  bound  are 
discarded.  When  the  number  of  vertices  of  degree  less  than  bound  is  smaller  than  threshold, 
the  values  of  bound  and  threshold  are  changed.  Phases  are  run  until  all  vertices  have  degree 
at  least  bound  or  the  graph  becomes  empty. 

FindSubtjraph2{V,  k) 
begin 

threshold  :=  n/f(n)\ 
bound fc; 

while  V  ^  0  and  there  is  a  vertex  of  degree  less  than  bound  do 
begin 

S  :=  {o  I  deg(v)  <  bound}] 
if  |5|  >  threshold  then 
V  :=  V  -  5; 
else 
begin 

bound  :=  bound / 4; 
threshold  threshold / /(n); 
end 

end 

end. 

Theorem  4.5.  If  HDSk(G)  is  nonempty,  the  algorithm  finds  a  subgraph  with  minimum 
degree  nt  least  -,7. 1  »»  0(/(«)  log  n  log/(n)  n)  time. 

Proof:  A  phase  is  the  group  of  iterations  for  which  bound  and  threshold  have  fixed  values. 
There  are  h>gy|nj  n  phases,  since  the  value  of  threshold  is  reduced  by  a  factor  of  f(n)  at  the 
end  of  each  phase  and  threshold  is  initially  y"^.  When  a  phase  ends,  less  than  threshold 
vertices  have  degree  less  than  bound,  so  the  minimum  degree  of  the  subgraph  that  is  found 
is  the  value  of  bound  when  threshold  --  1,  which  is  .  When  a  pluisc  begins, 

there  arc  less  than  f(n)  ■  threshold  vertices  of  degree  less  than  \  ■  bound.  It  follows  from 
Lemma  4.4  that  at  most  |  f[n )  •  threshold  vertices  arc  removed  by  removing  vertices  of 
degree  less  than  bound.  Since  at  least  threshold  vertices  arc  removed  at  each  iteration,  there 
arc  at  most  .]/(ri)  iterations  per  phase.  Hence  there  are  at  most  2;/(n)  log^,,}  n  iterations 
altogether.  An  iteration  takes  O(logn)  time,  so  the  algorithm  takes  ()(f(n)  log  71  logy^Tlj  n) 
time.  | 

If  different  values  are  used  for  f(n)  an  interesting  time/pcrformancc  trade  ofT  is  ex¬ 
hibited.  When  f(n)  =  logn  the  algorithm  is  in  A/ C  and  a  subgraph  with  minimum  degree 
(){dn~<),  for  any  c  >  0  is  found.  Time  and  performance  figures  are  given  in  the  table 
below.  The  lime  is  given  as  the  total  number  of  iterations,  neglecting  the  factor  of 
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4.5.  Finding  a  Maximum  Density  Subgraph 

An  alternate  approach  in  constructing  a  subgraph  with  high  minimum  degree  is  to 
look  for  a  maximum  density  subgraph.  The  results  that  we  get  by  this  approach  are 
stronger  than  the  previous  results  for  certain  cases,  although  the  resulting  algorithms  arc 
probabilistic.  The  density  of  a  graph  is  the  ratio  of  the  number  of  edges  to  the  number 
of  vertices,  so  a  maximum  density  subgraph  is  an  induced  subgraph  for  which  this  ratio  is 
as  large  as  possible.  We  denote  a  maximum  density  subgraph  of  the  graph  G  by  MD(G) 
and  the  maximum  density  by  MD[G).  Our  first  lemma  shows  that  a  maximum  density 
subgraph  has  high  degree. 

Lemma  4.5.  A  maximum  density  subgraph  ofG  has  all  vertices  of  degree  at.  least  M  D(G). 

Proof:  If  the  maximum  density  subgraph  had  a  vertex  of  degree  less  than  A//)((7),  then 
the  density  would  be  increased  by  deleting  that  vertex.  | 

The  maximum  density  subgraph  is  related  to  the  optimization  problem  discussed  in 
the  previous  section.  Computing  the  value  of  the  maximum  density  subgraph  gives  an 
approximation  of  lll)S(G)  that  is  guaranteed  to  be  within  a  factor  of  two. 

Lemma  4.6.  2 Af  75(G)  >  77 DS[G)  >  WD{G). 

Proof:  L  ct  II  be  a  subgraph  of  G  with  minimum  degree  IIDS(G).  Since  II  has  at  least 
^  \H  \  IIDS(G)  edges,  its  density  is  at  least  ^  77775(G).  The  density  of  H  is  no  greater  than 
the  maximum  density,  so  2 MD(G)  >  HDS(G).  Lemma  4.5  implies  that  IlDS  >  MI). 

I 

The  minimum  degree  of  a  maximum  density  subgraph  can  differ  from  ilf)S(G)  by  a 
factor  of  two.  An  example  of  such  a  graph  is: 


The  value  of  IIDS(G)  is  two,  while  the  minimum  degree  of  a  maximum  density  subgraph 
is  just  one. 
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The  maximum  density  subgraph  can  be  constructed  by  an  R.MC  algorithm.  The 
algorithm  for  constructing  a  maximum  density  subgraph  relies  on  a  reduction  due  to 
Goldberg  [Go],  which  reduces  finding  a  maximum  density  subgraph  to  a  unit  capacity 
network  flow  problem.  The  flow  problem  is  solved  using  the  techniques  of  [KUW]. 

Theorem  4.6.  The  problem  of  finding  a  maximum  density  subgraph  can  be  reduced  in 
log-space  to  the  problem  of  finding  a  maximum  flow  in  a  unit  capacity  network.  | 

Corollary  4.1.  A  maximum  density  subgraph  can  be  found  in  R.MC.  | 

There  is  a  slight  difference  in  the  types  of  approximations  achieved  by  our  two  algo¬ 
rithms.  The  first  approximation  constructs  a  graph  which  is  a  supergraph  of  f[DSk{G). 
The  maximum  density  subgraph  is  an  approximation  to  HDSti(G)t  where  d  —  HDS(G). 
The  maximum  density  subgraph  is  not  necessarily  a  supergraph  of  flDS,i(G).  In  the  fol¬ 
lowing  example,  IIDS,i(G)  is  the  entire  graph,  while  the  maximum  density  subgraph  is 
just  the  component  on  the  right. 


4.6.  Number  Problems 

One  class  of  problems  where  approximation  is  particularly  important  is  number  prob¬ 
lems.  A  number  problem  is  one  in  which  arbitrary  sized  integers  can  be  part  of  the  problem. 
For  example,  the  numbers  could  be  weights  of  edges  or  coefficients  of  linear  constraints.  It 
is  quite  common  for  number  problems  to  have  natural  approximations,  l'br  problems  that 
have  an  objective  function  which  is  being  opimized,  an  approximate  solution  is  one  that  is 
close  to  the  optimum.  Number  problems  can  be  approximated  in  other  ways  as  well.  For 
example,  an  approximate  solution  to  a  packing  problem  could  be  one  where  the  constraints 
arc  violated  by  a  small  amount.  Some  number  problems  can  be  solved  efficiently  if  the 
numbers  are  small.  Problems  of  this  type  can  often  be  approximated  by  modifying  the 
problem  by  truncating  the  values,  solving  the  modified  problem  exactly,  and  then  relating 
the  solution  of  the  modified  problem  to  an  approximate  solution  of  the  original  problem. 
In  this  section  we  give  a  couple  of  examples  of  problems  where  this  technique  is  used  in 
parallel  approximation  algorithms. 

The  first  problem  that  we  look  at  is  network  flow.  This  problem  illustrates  how  the 
difficulty  in  a  problem  can  be  caused  by  the  presence  of  large  numbers.  The  problem  of 
determining  a  maximum  flow  in  a  network  is  P-completc  [GSS].  The  P-completcness  proof 
uses  large  integers  as  capacities  to  encode  the  circuit  value  problem.  If  the  capacities  are 
bounded  by  a  polynomial  in  the  number  of  edges,  then  the  problem  can  be  solved  in  ZMC 
[KUW].  Wc  show  that  for  every  c  >  0,  a  flow  that  differs  by  a  factor  of  at  most  1  f  c  from 
the  maximum  flow  can  be  found  in  R.MC. 
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Theorem  4.7.  For  any  e  >  0,  maximum  How  can  he  approximated  to  a  factor  of  l  +  e  by 
an  R.MC  algorithm. 

Proof:  Let  G  =  (V,E),  \V\  —  n,  be  an  instance  of  network  flow  with  edge  capacities  c, 
for  each  e,  E  E.  Suppose  that  the  maximum  flow  has  size  /.  We  give  an  R.SC  algorithm 
that  linds  a  flow  of  size  /  where  /-/  <  tf.  The  basic  idea  in  the  approximation 
algorithm  is  to  truncate  the  values  of  the  edge  capacities  so  that  they  can  be  represented 
by  numbers  with  0(logn-(-log  ' )  bits.  The  modified  flow  problem  can  then  be  solved  using 
the  Karp-Upfal-Wigderson  matching  algorithm. 

First  we  need  bounds  on  the  maximum  flow  /.  Let  c  be  the  maximum  capacity  such 
that  there  is  a  path  from  s  to  t  using  only  edges  of  capacity  at  least  c.  The  network  must 
have  a  cut  that  is  made  up  of  edges  with  capacity  at  most  c,  so  /  <  n2c.  Since  there  is 
a  path  from  s  to  t  with  edges  having  capacity  at  least  c,  c  <  /■  Since  the  flow  is  at  most 
n2c,  there  is  a  maximum  flow  where  no  edge  carries  flow  of  more  than  n2c,  so  any  edge 
with  capacity  greater  than  n2c  can  be  replaced  by  an  edge  with  capacity  exactly  n2c. 

Let  j  =  max(0,  [log -i  |).  We  now  create  a  new  network  with  capacities  c, 

27  [<:1-2~JJ.  This  is  setting  the  j  least  significant  bits  of  each  capacity  to  0.  Since  the 
largest  capacity  is  at  most  cn2,  the  number  of  significant  bits  in  the  modified  capacities  is 
at  most  [log  cn2]  j  <  [log  cn2]  -  [log  <  1  4  4  log  n  f-  log  j.  The  problem  can  then 
be  solved  in  0(logfc  nlog  -)  time  to  get  an  approximate  flow  /.  The  most  that  the  approx¬ 
imate  flow  /  could  differ  from  the  original  flow  /  is  /  -  /  <  <:,  c,  <  n2~V  cc  <  ef. 

Thus  the  algorithm  achieves  the  desired  approximation.  I 

A  second  number  problem  that  can  be  approximated  is  list  scheduling.  This  is  a 
simple  scheduling  problem  that  involves  scheduling  jobs  on  two  processors.  'The  problem 
is: 

Given  a  list  of  jobs  /i  ,  each  with  an  execution  t  ime  /.(/,)  E  '/G  ,  construct 

a  two  processor  schedule  such  that  j,  is  started  no  later  than  j,  (  |  and  there  is  no 
idle  time  until  after  all  jobs  have  been  started. 

A  list  schedule  can  be  computed  by  considering  the  jobs  in  order  and  assigning  a  job  to 
the  first  processor  that  becomes  available.  The  problem  of  computing  a  list  schedule  is 
P-completc  [HM1[.  The  P-complcteness  proof  requires  the  use  of  large  numbers  for  the  job 
times.  The  computation  of  a  circuit  is  encoded  by  the  times  that  the  jobs  are  scheduled, 
with  certain  bits  of  the  times  corresponding  to  the  values  of  the  gates.  However,  the 
problem  can  be  solved  by  a  fast  parallel  algorithm  if  the  numbers  arc  small.  Ilelmbold  ami 
Mayr  |IIMlj  have  shown  that  if  the  job  times  are  bounded  by  L(n),  then  a  schedule  can 
be  computed  in  0(log  L(n)  log  n)  time  using  0(n2)  processors. 

For  list  scheduling,  we  define  an  approximate  solution  to  be  a  schedule  that  has  the 
same  first  come,  first  served  property  as  a  list  schedule,  but  wc  allow  idle  time  between 
the  jobs.  'Plie  smaller  the  total  idle  time,  the  better  the  approximation.  Using  an  NC 
algorithm  to  compute  a  list  schedule  for  problems  with  small  job  times,  we  can  construct 
an  SC  algorithm  to  approximate  list  scheduling  with  the  idle  time  an  arbitrarily  small 
fraction  of  the  schedule  length. 


Theorem  4.8.  For  all  (  >  0,  list  scheduling  can  be  approximated  by  an  SIC  algorithm 
such  that  the  proportion  of  the  idle  time  is  less  than  e. 

Proof:  The  algorithm  for  approximating  a  list  schedule  rounds  the  job  times  up  so  that 
they  are  multiples  of  2k  with  O(logn)  significant  bits  and  then  solves  the  modified  problem 
exactly.  Let  be  a  list  of  jobs  with  job  times  /(y,),  and  let  c  >  0.  Let  t  be 

the  largest  of  the  job  times.  If  t  <  ”,  the  problem  can  be  solved  exactly  in  SIC,  so 
suppose  t  >  ”.  Suppose  2fc  <  ~  <  2k  1  1 .  We  create  a  modified  problem  with  job  times 
l(jt)  =■  2k  \l(j,)2~k  |.  The  modified  problem  can  be  solved  exactly,  and  then  the  jobs  of  the 
original  problem  can  be  scheduled  at  the  times  of  the  modified  problem.  The  idle  time  after 
job  j,  is  run  is  at  most  t(j,)  t(j,)  <  2*.  The  total  idle  time  is  at  most  n2  k  <  <  = 

Since  the  length  of  the  schedule  is  at  least  t ,  the  proportion  of  idle  time  is  at  most  e. 

I 

The  approximation  algorithms  for  network  flow  and  list  scheduling  both  depend  on 
being  able  to  solve  the  problems  efficiently  when  the  numbers  are  small.  However,  some 
problems  remain  difficult  when  the  numbers  arc  small.  In  the  theory  of  NP-completeness,  a 
distinction  is  made  between  weakly  and  strongly  NP-compfcte  problems  in  order  to  classify 
number  problems.  If  an  NP-complete  problem  remains  NP-complctc  when  restricted  to 
instances  of  the  problem  involving  only  small  numbers,  it  is  NP-completc  in  the  strong 
sense,  while  if  it  can  be  solved  in  polynomial  time  when  the  numbers  .arc  small,  the  problem 
is  NP-complete  in  the  weak  sense.  We  can  make  the  same  distinction  for  P-complete 
problems. 

Definition  4.1.  A  P-complete  problem  is  strongly  P-completc  if  there  exists  a  polynomial 
p  such  that  the  problem  remains  P-completc  when  restricted  to  instances  l  with  the  largest 
number  bounded  by  p(|/|). 

A  P-complete  problem  that  is  not  a  number  problem,  such  as  the  circuit  value  prob¬ 
lem  is  P-complete  in  the  strong  sense,  so  the  distinction  is  only  interesting  for  number 
problems.  An  important  example  of  a  number  problem  that  is  P-completc  in  the  strong 
sense  is  linear  programming.  Linear  programming  is  a  number  problem  since  the  coeffi¬ 
cients  of  the  equations  can  be  arbitrary  integers.  Cook  has  shown  (sec  (HR]),  that  the 
problem  of  determining  if  a  set  of  linear  inequalities  has  a  solution  is  P-completc.  This 
problem  remains  P-comp!ctc  when  the  coefficients  are  restricted  to  -f  1  and  -1,  so  linear 
programming  is  P-completc  in  the  strong  sense.  In  light  of  this  result,  a  fast  parallel  ap¬ 
proximation  for  linear  programming  is  unlikely.  In  the  next  section  we  present  another 
strongly  P-complete  problem. 

4.7.  First  Fit  Bin  Packing 

In  this  section  and  the  next,  we  examine  the  problem  of  computing  a  first  fit  decreasing 
(FFI))  bin  packing.  We  show  that  the  problem  is  strongly  P-completc,  and  that  the 
problem  can  be  approximated  in  a  reasonable  sense.  The  bin  packing  problem  is: 

Given  a  list  of  items  IJ  —  with  sizes  s(u,)  <  1  for  u,  C  If,  find  an 

assignment  of  the  items  to  unit  capacity  bins  such  that  the  number  of  bins  used 


is  as  small  as  possible.  The  sum  of  the  sizes  of  the  items  assigned  to  a  bin  must 

be  at  most  one. 

For  ease  of  exposition,  we  refer  to  s(u,)  just  as  it,.  Bin  packing  is  a  well  known  NP-complcte 
problem  [GJ3].  Sequential  approximation  schemes  to  bin  packing  have  been  developed  and 
extensively  analyzed  [GJ3|.  An  important  approximation  algorithm  for  bin  packing  is  the 
first  fit  algorithm.  First  fit  considers  the  items  one  at  a  time,  and  places  each  item  in  the 
first  bin  with  enough  room.  If  the  list  is  sorted  so  that  the  items  are  non-increasing,  then 
the  algorithm  is  first  fit  decreasing  (FFD),  and  if  the  list  is  sorted  so  that  the  items  are 
non-decreasing,  the  algorithm  is  first  fit  increasing  (FFI).  An  FFI  packing  is  within  of 
optimal  and  an  FFD  packing  is  within  ~  of  optimal. 

The  result  of  this  section  is  that  it  is  P-complete  in  the  strong  sense  to  compute  an 
FFD  packing.  The  problem  of  computing  an  FFD  packing  is  a  number  problem,  since  the 
items  may  have  arbitrary  sizes.  However,  our  result,  shows  that  this  problem  is  difficult 
even  if  the  item  sizes  are  “small.”  Many  P-completc  number  problems,  such  as  network  flow 
and  list  scheduling  arc  only  P-complete  in  the  weak  sense.  This  is  one  of  the  first  results 
that  shows  a  number  problem  to  be  P-complete  in  the  strong  sense.  The  result  shows  that 
the  source  of  the  difficulty  in  computing  an  FFD  packing  is  from  the  arrangement  of  items 
in  the  bins  as  opposed  to  being  from  the  numbers  involved  in  the  problem. 

The  strong  P-complcteness  result  suggests  that  a  scaling  approach  is  not  likely  to  lead 
to  a  good  approximation  to  an  FFD  packing.  However,  in  the  next  section  we  show  that  a 
reasonable  approximation  to  an  FFD  packing  can  be  computed  by  an  SIC  algorithm.  We 
show  that  if  the  item  sizes  arc  bounded  below  by  a  constant,  then  the  FFD  packing  can 
be  computed  in  O(logn)  time.  This  gives  us  an  SIC  algorithm  to  compute  a  packing  that 
obeys  the  same  performance  bound  as  an  FFD  packing. 

Theorem  4.9.  77ie  problem  of  computing  a  first  lit  decreasing  /Kicking  is  strongly  P- 
complctc. 

Proof:  The  proof  is  a  reduction  from  the  monotone  circuit  value  problem.  The  reduction 
has  two  stages.  The  first  stage  reduces  the  monotone  circuit  value  problem  to  computing 
an  FFD  packing  into  bins  of  variable  size  (the  sizes  of  the  bins  arc  specified  as  part  of  the 
instance).  The  second  stage  reduces  computing  an  FFD  packing  into  bins  of  variable  size 
to  computing  an  FFD  packing  into  unit  capacity  bins. 

Let  P  -  be  a  monotone  circuit.  We  transform  P  into  a  non-incrcasing  list 

of  items  .and  a  list  of  bins.  There  is  a  distinguished  item  u  and  a  distinguished  bin  b.  The 
item  u  is  placed  in  b  by  a  first  fit  packing  if  and  only  if  the  output  of  the  circuit  is  true. 

For  each  gate  there  is  a  fist  of  items  and  a  list  of  bins.  The  items  and  bins  are  ordered 
by  gate  number,  so  if  i  <  j,  the  items  and  bins  for  gate  P,  come  before  the  items  and  bins 
for  gate  /?,.  Among  the  items  for  gate  /?,  are  two  pairs  of  items,  Tt,  T,  and  Ft,  /'j  which 
indicate  the  values  of  the  inputs  to  the  gate.  Exactly  two  of  these  items  arc  placed  by  the 
first  fit  packing  in  the  bins  for  /?,,  the  other  two  are  packed  in  bins  for  lower  numbered 
gates.  The  two  that  are  placed  in  the  bins  for  P,  give  the  value  for  the  inputs  to  the  gate. 
If  the  output  of  the  gate  /?,  is  connected  to  the  gates  P}  and  /?*,  the  bins  for  /?,  get  cither 
Tj,  7V  or  I'),  l'\,  depending  upon  the  value  of  the  gate.  If  7j  and  7\  are  placed  in  the  bins 
for  Pi,  then  the  gates  P}  and  /?*  receive  false  values  for  P 
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Packing  for  AND  gate  p,  with  one  true  input  and  one  false  input 

Suppose  there  arc  n  gates  in  the  circuit.  Let  <5,  =  1  —  j  and  c  —  gr-^-yr.  The  items 
for  gate  /?,  have  sizes: 

Si,  <5,,  <5,  —  4c,  6,  —  4c,  6,  —  4c,  6,  —  5c,  6,  —  5e,  <5,  -  8c,  6,  --  8c. 

The  first  four  items  arc  T,,  T,,  Ft,  and  F,  respectively.  The  list  of  items  is  non-increasing. 
The  bins  for  an  AND  gate  /?,  with  outputs  to  (3j  and  (3k  have  sizes: 

2 St  -  8c,  2 St  -  10c,  Si  +  (Sj  -  8c,  Si  -I-  Sk  -  8c. 

The  bins  for  an  OR  gate  /?,  with  outputs  to  0}  and  flk  have  sizes: 

2Si  —  8c,  S,,  2S,  -  10c,  6,  +  S}  -  8c,  S,  f  Sk  —  8e. 

The  first  two  bins  evaluate  the  value  of  the  gate  and  the  last  three  bins  propagate  the 
value.  Packings  of  the  AND  gate  for  the  inputs  TT  and  'IT'  are  illustrated  in  the  figure 
below.  The  OR  gate  is  similar.  For  gates  fti  that  have  a  constant  input,  either  a  T,  or  a 
l'\  is  deleted  to  give  the  gate  the  right  input. 
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To  finish  the  proof,  wc  give  a  reduction  of  FFD  packing  with  variable  bin  sizes  to  an 
FFD  packing  with  unit  capacity  bins.  Let  ui,...,uq  be  a  list  of  items  and  f>j,...,6r  be 
a  list  of  bins.  We  construct  a  list  of  decreasing  items  tq, . . .  ,  t>2r  which  when  packed  into 
bins  of  size  C  leave  space  6,  in  the  2-th  bin.  The  packing  of  u1(...,u?  into  6i,...,6r  is 
transformed  into  packing  tq, . . .  ,U2r,ui, into  bins  of  size  C .  The  sizes  can  then  be 
normalized  to  give  a  packing  into  unit  capacity  bins. 

Let  b  be  the  largest  of  the  6,’s  and  C  —  (2 r  -f  1  )b.  Without  loss  of  generality  we 
assume  that  all  the  b,' s  have  size  greater  than  0.  The  sizes  of  the  items  are: 

(  C  -  ib  6,,  if  t  <  r; 

V'  \  C  ~  t&,  if  t  >  r. 

The  items  vt  and  iqr+i-t  are  put  into  the  z-th  bin,  leaving  6,  empty  space.  The  list 
iq, . . .  ,V2r  is  non-increasing. 

The  largest  number  involved  in  the  reduction  is  polynomial  in  the  size  of  the  circuit, 
so  the  proof  shows  that  it  is  strongly  P-completc  to  compute  an  FFD  packing.  | 


4.8.  Approximating  an  FFD  packing 

Although  it  is  probably  not  possible  to  compute  a  first  fit  decreasing  packing  in  A/C, 
we  can  approximate  a  first  fit  decreasing  packing  in  a  reasonable  sense.  Wc  show  that  if 
the  sizes  of  all  of  the  items  are  at.  least  J,  then  an  FFD  packing  can  be  computed  in  A 1C. 
Thus  we  can  find  a  packing  that  agrees  with  FFD  on  all  of  the  big  items.  As  a  corollary  to 
our  result,  wc  show  how  to  construct  a  packing  in  A 1C  that  obeys  the  same  performance 
bound  as  FFD. 

Our  algorithm  for  computing  an  FFD  packing  of  items  of  size  at.  least  is  relatively 
simple.  The  basic  idea  is  to  decompose  the  problem  into  a  series  of  packing  problems  that 
have  a  simple  structure.  We  show  that  the  number  of  subproblcms  is  independent  of  the 
number  of  items  in  the  list,  although  it  does  depend  upon  k. 

The  FFD  algorithm  is  broken  into  phases,  with  the  j-th  phase  packing  all  of  the  items 
in  the  interval  [T ,  The  bins  which  are  filled  to  at  least  1  —  ,  cannot  be  of  use 

in  the  j-th  phase,  so  they  are  ignored  for  the  phase.  The  bins  which  are  filled  to  less  than 
1  "  consist  of  groups  of  consecutive  bins  in  which  the  amount  of  remaining  space 

is  increasing,  as  is  illustrated  in  the  figure  below.  Wc  prove  that  the  number  of  groups 
is  bounded  by  a  constant  depending  on  j  but  not  on  the  number  of  items.  The  packing 
into  groups  is  done  sequentially,  first  by  packing  into  the  first  group,  then  into  the  second 
group,  and  so  on.  The  routine  Pack  computes  the  packing  of  items  of  size  [  j1,  ,  2,-,r)  into  a 
list  of  increasing  sized  bins  in  O(logn)  time,  'l’hc  details  of  the  routine  Pack  .arc  described 
below,  'l’hc  algorithm  FFD  runs  in  O(logn)  lime  since  the  number  of  phases  and  the 
number  of  calls  to  Pack  is  constant. 
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Bins  filled  to  at  most  1  —  ^jtt 


FFD(U  =  ui, . . . , «n); 

begin 

for  j  «—  1  to  k  -  1  do 
begin 

Uj  <—  {«,•  €  U  |  jTtt  <  <  ^j}; 

Let  B'  be  the  sublist  of  bins  filled  to  less  than  1  —  ; 

Divide  B'  into  groups  of  consecutive  bins  such  that  the  empty  space  in  each  group  is 
increasing. 

for  each  group  G  do 
Pac/c((/j,G); 

end 

end. 

The  major  subroutine  of  our  bin  packing  algorithm  is  to  pack  a  decreasing  list  of  items 
U  —  u with  u,  G  [ 2j  >  27 T f )  111  to  an  increasing  list  of  bins.  The  reason  that  this 
packing  problem  is  easy  to  solve  is  that  the  resulting  packing  has  a  simple  structure.  The 
number  of  items  placed  in  the  bins  increases  with  bin  number.  The  routine  consists  of  a 
number  of  phases.  The  t-th  phase  packs  consecutive  bins  that  can  fit  i  items  each.  The 
items  arc  taken  from  the  start  of  the  list  and  packed  i  per  bin  until  a  bin  is  encountered  that 
fits  i  +  l  items.  Some  of  the  bins  that  received  i  items  can  accommodate  one  additional  item 
from  further  down  the  list.  These  items  are  added  and  then  the  next  phase  is  run  using  the 
remaining  items  and  the  remaining  bins.  In  the  example  below,  the  items  u4,...,u9  are 
placed  in  consecutive  bins  by  the  second  phase,  and  the  item  ui4  fits  in  the  bin  containing 
ug  and  «q.  When  packing  items  in  the  range  [T(  ^tt),  there  arc  fewer  than  2,  +  1  phases 
since  no  more  than  2,'t*  -  I  items  can  be  placed  in  a  bin.  A  phase  can  easily  be  implemented 
to  run  in  O(logn)  time. 

Pack(U  -  ui,. . .  ,un,B  -  &i,. ..  ,6ro) 

begin 

for  i  :=  1  to  t  do 
begin 

Find  s  such  that  u(r  1)t ,  t  H - h  urt  <  br  <  U(r  i)<  1 1  +  •  •  •  +  u, ,  ,.|  for  r  <  s  and 

««»'  1-1  +  •  •  •  T  I  1)«  |  1  <  I  1  . 

Place  the  items  ui,. . .  ,u„i  into  the  bins  6i, . . .  ,6,. 
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An  example  of  a  packing  performed  by  Pack 


for  each  item  Uj,  j  >  si  +  1 

Find  the  first  bin  br,  r  <  s  that  has  enough  space  remaining  for  Uj. 

Add  the  additional  items  to  the  bins  bi, . . .  ,ba  by  placing  items  in  the  first  possible 
bin,  pushing  overflow  items  towards  the  higher  numbered  bins. 

Remove  the  placed  items  from  U  and  the  filled  bins  from  B  and  renumber  the  lists, 
end 

end. 

The  following  lemma  establishes  that  Pack  computes  a  first  lit  packing. 

Lemma  4.7.  Let  U  =  u |,...,un  be  a  decreasing  list  of  items  with  ut  £  [~,  ^yrr)  and 
B  —  6 1 , . . . ,  be  an  increasing  list  of  bins  with  6m  <  1.  The  routine  Pack  computes  a 
first  fit  decreasing  packing  of  U  into  B. 

Proof:  The  potential  place  for  the  algorithm  to  be  incorrect  is  in  the  placement  of  the 
s  groups  of  i  items  each  in  the  first  s  bins  of  B.  Suppose  an  item  u'  that  is  placed  in  a 

bin  6r,  r  <  s  actually  (it  in  a  bin  br>,  r'  <  r.  So  ur/j+j  +  •••-{-  it(r»  +  ip  f  u'  <  br> .  But, 

Uri  +  l  H - +  «r(i+l)  +  «r(,  +  l)  +  l  <  Ur'«  +  1  + - f  «(r'  +  l).  +  «'  <  &r'  <  br.  Hence,  r  >  S. 

In  order  to  prove  that  our  algorithm  for  constructing  an  FFD  packing  of  items  of 

size  at  least  j-  is  a  fast  parallel  algorithm,  we  must  show  that  the  number  of  groups  of 
bins  encountered  in  the  algorithm  is  bounded  by  a  constant.  We  prove  a  slightly  stronger 
result  than  we  need  by  showing  that  the  constant  is  in  fact  only  polynomial  in  k.  Thus 
our  algorithm  remains  an  SIC  algorithm  even  if  k  is  some  slowly  growing  function  in  n, 
sucli  as  log  n. 

We  must  now  show  that  the  number  of  groups  of  consecutive  bins  that  the  algorithm 
considers  when  packing  items  of  size  greater  than  £  is  bounded  by  a  constant  C*.  Our 
proof  shows  that  this  constant  is  (^(/c4).  We  prove  the  theorem  by  keeping  tr.ack  of  the 
number  of  intervals  of  bins  of  increasing  size  at  the  start  of  each  phase.  We  can  bound  the 
increase  in  the  number  of  intervals  by  considering  in  some  detail  the  way  items  are  packed 
into  the  bins. 


Let  U  —  ui,...,un  be  the  list  of  items  and  B  —  bi,...,bTn  be  the  list  of  bins.  We 
denote  the  amount  of  space  left  in  bin  6,  at  a  given  time  by  s(6t).  This  value  depends 
upon  the  phase  of  the  packing.  The  projection  of  a  packing  onto  a  subset  B'  of  the  bins 
is  the  set  of  items  U'  that  are  placed  into  the  bins  B' .  It  is  a  folk  theorem  of  bin  packing 
that  U'  is  packed  into  B'  by  a  Grst  fit  packing  in  exactly  the  same  manner  as  the  items  U' 
are  packed  in  the  full  packing.  A  ^-projection  is  the  projection  of  the  packing  onto  the 
set  of  bins  6,  such  that  s(6{)  >  In  the  j-tli  phase,  when  we  pack  items  in  the  interval 
[T-,  2jVr)i  we  °nly  consider  the  ^Vr-projection  of  the  packing  up  to  that  point. 

The  major  portion  of  this  proof  is  accounting  for  groups  of  bins  with  increasing  space. 
Basically,  an  interval  is  a  set  of  consecutive  bins  with  remaining  space  increasing.  However, 
for  bookkeeping  reasons,  we  break  up  the  intervals  at  powers  of  two,  so  all  bins  of  an 
interval  have  space  in  [^-,  ^rn)  f°r  some  j.  We  also  relax  the  condition  that  the  bins  are 
consecutive;  we  allow  some  bins  with  less  space  to  be  between  the  bins  of  the  interval. 
Finally,  we  insist  that  our  intervals  arc  maximal,  so  that  bins  cannot  be  added  to  an 
interval  without  violating  one  of  the  defining  properties. 

Definition  4.2.  A  --  -interval  is  a  set  of  bins  btl , . . . ,  btr  such  that: 

L  2TTT  <  s(b, J  <  «(&.,)  <  ••  <  Hbu)  <  27- 

2.  Let  6i,...,6m  be  the  bins  of  the  ~~c -projection  of  the  packing.  The  image  of 
btl , ... ,  b,r  is  a  set  of  consecutive  bins  6.,, . . . ,  bs +r-i. 

3.  s  =  1  or  s(6,_j)  >  s(6,). 

4.  s  +  r  -  1  =  m  or  s( >  T  or  s( 6.,+r)  <  s(6,+r_i). 

An  important  detail  in  our  definition  of  an  interval  is  that  we  allow  intermediate  bins 
to  have  less  space  than  the  bins  of  the  interval.  The  figure  below  shows  a  situation  that 
might  occur.  The  interval  is  separated  by  an  interval  of  bins  with  much  less  space.  The 
reason  that  we  allow  this  in  our  definition,  is  that  we  do  not  want  to  subdivide  the  bins 
too  early,  or  else  we  generate  too  many  intervals. 


Each  -interval  is  assigned  a  weight  of  2'  2j.  We  show  that  the  sum  of  the  weights 
of  the  intervals  is  0(4J)  at  the  start  of  phase  j.  This  allows  us  to  bound  the  number  of 
groups  of  consecutive  bins  that  we  consider  in  the  algorithm. 
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We  begin  by  examining  the  packing  of  items  in  the  range  [^Vi^rTr)  *n^°  a  se^  °f 
consecutive  bins  with  increasing  space  in  the  range  [jr,  jitt).  Here,  we  assume  that  the 
bins  are  fully  packed,  meaning  that  after  the  items  are  placed,  none  of  the  bins  have  more 
than  --  space.  This  is  the  type  of  packing  that  is  performed  by  the  routine  Pack  described 
above. 

The  items  that  are  packed  can  he  divided  into  two  types  of  items,  the  forward  items 
and  the  fill-in  items.  The  routine  Pack  places  items  r  per  bin  from  the  front  of  the  list 
until  a  bin  is  encountered  that  can  hold  at  least  r  +  1  items,  then  the  routine  looks  ahead 
in  the  list  of  items  and  finds  additional  items  that  can  fit  with  the  items  placed  r  per  bin. 
The  items  placed  in  groups  from  the  front  of  the  list  are  the  forward  items,  and  the  items 
placed  by  looking  ahead  in  the  list  are  the  fill-in  items. 

If  we  just  look  at  the  forward  items,  then  the  bins  are  divided  into  runs  of  bins  with 
increasing  space.  The  set  of  bins  that  receive  the  same  number  of  items  forms  a  run.  The 

number  of  runs  formed  is  at  most  ~i - jirr  =  \  |r-  The  amount  of  space  left  in  a  bin  is 

at  most  so  the  weight  of  each  run  is  less  than  Yli>}  2-2-7  —  ^2~2}.  The  total  weight 

is  less  than  ||r§2_2j  =  <  2~2*  if  j  >  l.  Thus  the  weight  is  not  increased  by  the 

forward  packing. 

We  now  consider  packing  items  of  size  [T-,  jJTt)  info  an  increasing  list  of  bins  of  size 
[ 2V >  2J7t)-  This  covers  the  fill-in  items. 

Lemma  4.8.  The  number  of  ~ intervals  generated  by  a  packing  of  items  of  size  [  ,  ^yrr ) 
into  an  interval  of  size  [~,  2JVr )  is  at  most  |y. 

Proof:  Let  61,...,6„,  be  the  ^rn  -projection  of  the  packing.  A  reversal  occurs  when  an 
item  u,. |  i  is  placed  in  a  lower  numbered  bin  than  the  previous  item  ut.  ICach  -interval 
(except  possibly  the  first  one  formed)  begins  with  a  reversal.  Let  u, ,  |  be  the  first  item  of 
a  T -interval,  and  suppose  it  was  placed  in  the  bin  There  is  at  least  2,--,  space  left  in  6, 
after  t  is  placed.  Since  «,  did  not  fit  in  6.,,  |  jrrr  <  ui-  Thus,  each  ,  -interval 

other  than  the  Grst  one  accounts  for  a  gap  of  more  than  2,\  i  in  the  item  sizes.  Since  the 
difference  between  the  first  and  the  bast  item  is  at  most  ,  the  number  of  L -intervals 
is  at  most  .  I 

The  weight  of  the  resulting  packing  is  bounded  by  Yl,>j  §r 2 ~2t  —  2  -2~2-7 .  The  weight 
is  at  most  double  the  weight  of  the  original  interval. 

The  j'-tli  phase  of  the  algorithm  packs  the  items  in  the  range  [T)  so  for  the 

j-tli  phase,  we  can  neglect  all  of  the  bins  with  space  at  most  2J\ ,  .  Suppose  the  weight 
at  the  start  of  phase  j  is  Wj.  Since  the  ^--intervals  have  weight  2~2j,  there  arc  at  most 
22jw}  intervals  to  consider.  However,  since  these  intervals  might  overlap,  some  of  them 
may  have  to  be  split.  If  a  --interval  encloses  a  j,- interval,  (i  <  l  <  j ),  then  m'c  split  the 
outer  interval.  The  number  of  intervals  that  arc  added  is  bounded  by  the  original  number 
of  intervals.  At  most  2  •  2 2juj7  intervals  need  to  be  considered  in  packing  the  j-th  phase. 

The  packing  of  the  forward  items  in  the  J, -intervals  ( l  <  j)  docs  not  increase  the 
total  weight,  except  when  a  bin  is  only  partially  filled.  At  most  one  bin  is  partially  filled 
by  a  phase,  so  this  adds  at  most  a  constant  to  the  weight.  (The  constant  is  in  fact  at 


most  ^.)  In  computing  the  weight,  the  forward  packing  can  be  considered  before  the' 
splitting  of  intervals,  since  the  forward  packing  cost  does  not  depend  on  the  bins  bein^ 
consecutive.  The  splitting  of  intervals  can  double  the  number  of  ~ -intervals,  so  this  can 
double  the  weight.  The  weight  is  again  doubled  by  packing  into  the  —intervals.  Hence 
+  x  <  iwj  +  jj.  Since  wo  —  1,  we  have  wj  <  |  •  4J  . 

When  the  items  packed  all  have  size  at  least  there  are  at  most  \ogk  phases.  The 
weight  is  then  0(k2)  at  the  end  of  the  algorithm,  so  the  number  of  groups  of  bins  considered 
in  the  final  phase  is  0(/c4).  Thus,  we  have  the  following  theorem: 

Theorem  4.10.  The  algorithm  FFD  computes  a  first  fit  decreasing  packing  of  items  of 
size  greater  than  r  in  0(log  n)  time.  | 

We  can  use  our  algorithm  to  construct  a  {lacking  that  obeys  the  same  performance 
bound  as  FFD.  The  algorithm  for  doing  this  combines  a  first  fit  decreasing  packing  and 
a  first  fit  increasing  packing.  The  algorithm  first  packs  all  items  that  are  in  the  interval 
[l,  |)  using  a  first  fit  decreasing  packing,  and  then  packs  the  remaining  items  using  a  first 
fit  increasing  packing.  Let  Lp(I)  be  the  length  of  the  first  fit  decreasing  packing  and 
OPT[I)  be  the  optimal  packing  for  a  list  /  of  items.  First  we  show  that  this  packing  is 
relatively  close  to  optimal. 

Theorem  4.10.  The  length  of  the  composite  packing  Lc{f)  satisfies 

LC{I)  <  max(/,D(/),  ] OPT{I )  +  1)  <  ~OPT{I )  +  4 

a  9 

Proof:  Let  L  be  the  length  of  the  FFI)  packing  of  the  items  with  size  greater  than 
Clearly  L  <  Lo{l),  so  if  all  the  items  are  placed  in  the  first  L  bins,  then  £<•;(/)  <  Lp(f). 
If  more  than  L  bins  are  used,  then  all  bins  except  for  possibly  the  last  one  is  lilted  to  at 
least  jj,  so  Lc(I)  <  %OPT(L)  4-1.  | 

The  first  lit  increasing  part  of  the  packing  can  be  done  fairly  easily  with  a  fast  parallel 
algorithm.  The  following  lemma  sketches  how  an  FFI  packing  is  computed. 

Lemma4.9.  A  first  fit  increasing  packing  into  variable  sized  bins  can  be  computed  in  UC. 

Proof:  The  property  that  an  FFI  packing  has  that  makes  it  easy  to  compute  with  a  fast 
parallel  algorithm  is  that  the  order  of  the  items  in  the  bins  is  the  same  as  the  initial  order 
of  the  items  in  the  list.  The  key  part  of  computing  the  FFI  packing  is  to  identify  the  first 
item  placed  in  each  bin.  One  way  this  can  be  done  is  to  compute  for  each  bin  f>}  and  each 
item  u, ,  the  first  item  available  for  bin  b}  |  j  assuming  that  u,  is  the  first  item  available 
for  bin  bj.  The  first  item  for  each  bin  can  then  be  computed  using  path  doubling.  This 
algorithm  can  be  implemented  to  run  in  O(logn)  time  using  0(n2)  processors.  | 

The  composite  packing  is  computed  by  partitioning  the  items  into  the  items  of  size  at 
least  £  and  the  items  of  size  less  than  ^ .  The  first  group  is  packed  using  the  FFD  algorithm, 
and  the  second  group  is  packed  using  FFI.  A  similar  parallel  algorithm  for  computing  a 
{jacking  that  obeys  the  same  performance  bound  as  FFD  has  been  independently  discovered 
by  Warmuth  [Warj. 


4.0.  Discussion 


The  results  on  approximating  P-complete  problems  are  similar  to  approximation  re¬ 
sults  for  NP-complete  problems.  Our  results  in  this  chapter  illustrate  a  number  of  points  of 
similarity.  Our  result  on  approximating  the  high  degree  subgraph  problem  shows  that  it  is 
possible  to  get  tight  bounds  on  the  degree  of  approximation  that  is  feasible,  by  an  MC  algo¬ 
rithm  assuming  that  P  ^  MC.  This  result  parallels  a  number  of  results  on  .approximating 
NP-complete  problems  with  parallel  algorithms.  Some  P-complcte  number  problems  can 
be  solved  in  A 1C  when  the  numbers  involved  are  small.  Efficient  parallel  approximation 
schemes  can  often  be  found  for  problems  of  this  type.  Other  number  problems  remain 
difficult  when  they  are  restricted  to  instances  that  involve  small  numbers.  The  notion 
of  strong  P-completeness  captures  this,  being  analogous  to  strong  NP-completeness.  The 
problem  of  computing  an  FFD  bin  packing  is  strongly  P-complete.  This  problem  can  still 
be  approximated  in  a  reasonable  sense,  since  an  FFD  packing  can  be  computed  in  MC  if 
the  sizes  of  the  items  are  bounded  below  by  a  constant. 
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